The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
So I have above datamodel.
A (highest granularity) has multiple rows of B, and B has multiple rows of C (lowest granularity). See below data structure.
I'm trying to make measures that can calculate between different granularities.
Let's say I have a fact called 'C Feit' in table 'Feit C level', I can then simply use SUM( 'Feit C Level'[C Feit] ) ) to calculate the values for higher granularities.
But I'm stuck with doing the opposite. I obviously need to divide by the count of rows, but I can't figure out how to make it work.
Let's say for these fact values:
A Index 1 = 100
A Index 2 = 80
A Index 3 = 60
A Index 4 = 40
Below fact values should be calculated:
C Index 1 = 25
C Index 2 = 25
C Index 3 = 25
C Index 4 = 25
C Index 5 = 20
C Index 6 = 20
C Index 7 = 20
C Index 8 = 20
And so on...
Anyone? Thanks in advance!
Solved! Go to Solution.
Hi there.
The key to good Power BI reports is a sound and simple model. Power Query (M) can also be optimized like DAX but one has to know HOW. Please read the documentation on the M language, read about eager and lazy evaluation rules and apply them wisely. A good source of information on Power Query and M is Chris Webb's blog at
and also this tutorial
I have read this all and know how to optimize M so that it's blazingly fast. I suggest you do the same. Moving data mashup operations into DAX is never the way to go unless... you want to get into deep, deep trouble later on; it's like a time bomb that will blow up rather sooner than later.
Also, please note that if you don't use the dimensional design (separating dims and facts), you'll sooner or later be producing wrong numbers. This is due to the fact that SUMMARIZECOLUMNS optimizes the retrieval of attributes from tables using the auto-exist feature (read upon auto-exist in DAX on www.sqlbi.com; just use the search engine on the page).
Secondly, the golden rule of dimensional modeling is this:
Your dimensions should be short but wide and your fact tables should be narrow but long.
There's NO EXCEPTION TO THIS.
Now, please look at the attached file. It has a model as it should be set up and enables you to calculate everything you need simply and fast. This IS exactly how it should be done. You won't get anything better in terms of speed and simplicity.
Please try to learn how to optimize operations in M. Do not shoot yourself in the foot.
Best
D
@Anonymous This model is just a simple model for testing purpose. I want to find a DAX way of calculating between fact tables with different granularity. I know the solution for lower grain to higher grain is SUM, I'm looking for the opposite solution for higher grain to lower grain.
I am coming from here. I am super confused because everywhere on the Internet I read different things about making a star schema design. Some say, merge as much as possible in as few tables as possible, others say keep the dimension tables apart - and then I can't create a hierarchy anymore in Power BI... From the comments in my linked topic above I concluded I should keep the dim tables apart...
I have fact tables at different granularity, some data is available in one fact table, while it should be available in the other fact table, and vice versa. One approach would be to bring everything down to the lowest grain in Power Query but then I get very long running refreshes and it's difficult to maintain. I want to be able to easily add future tables and columns and refresh only those parts. So my current philosophy is to keep separate tables and when something is available at higher granularity I divide it in a way to fill the lower granularity tables and vice versa...
Previously I tried to do it all with one big table, or maybe 2, but my power query code got huge and unmaintainable. If I wanted to do something simple like introduce a new column, I would have to verify and change all steps in power query. Every time a new feature was requested it was too much work in PQ. With doing less in PQ and more in DAX I think I can be more flexible and I can reduce refresh times....
Looking forward to hear back.
Hi there.
The key to good Power BI reports is a sound and simple model. Power Query (M) can also be optimized like DAX but one has to know HOW. Please read the documentation on the M language, read about eager and lazy evaluation rules and apply them wisely. A good source of information on Power Query and M is Chris Webb's blog at
and also this tutorial
I have read this all and know how to optimize M so that it's blazingly fast. I suggest you do the same. Moving data mashup operations into DAX is never the way to go unless... you want to get into deep, deep trouble later on; it's like a time bomb that will blow up rather sooner than later.
Also, please note that if you don't use the dimensional design (separating dims and facts), you'll sooner or later be producing wrong numbers. This is due to the fact that SUMMARIZECOLUMNS optimizes the retrieval of attributes from tables using the auto-exist feature (read upon auto-exist in DAX on www.sqlbi.com; just use the search engine on the page).
Secondly, the golden rule of dimensional modeling is this:
Your dimensions should be short but wide and your fact tables should be narrow but long.
There's NO EXCEPTION TO THIS.
Now, please look at the attached file. It has a model as it should be set up and enables you to calculate everything you need simply and fast. This IS exactly how it should be done. You won't get anything better in terms of speed and simplicity.
Please try to learn how to optimize operations in M. Do not shoot yourself in the foot.
Best
D
@Anonymous Thank you very much for your extensive reply. I finally feel like I am close to the point where I understand what path I need to take.
The funny thing is that I used to do all this in PQ, but then somehow my queries became too long and complicated because I was doing too much in PQ. There is a fine line where you have to stop in PQ and move on to DAX. With your help I might finally be able to understand where that fine line is.
I'm still having some internal debates though. With below bulletpoints I will try to make them clear. Could you help me verify that I'm going in the right direction by going through Q1 to Q9?
= Table.AddColumn(#"Changed Type",
"A Divisor",
(r) => Table.RowCount(Table.SelectRows(#"Changed Type", each r[A Index] = _[A Index]))
)
I've been struggling with the whole balance between all of this for ages. I think when you could answer these 9 questions I would finally have a good starting point that leads me in the right direction. Your feedback would be much appreciated!
* Could I say that whatever needs to be done to get a good data model to the lowest grain should be done in PQ, and all the rest in DAX? Q1
> Yes. That's right. Power Query is a lot more powerful than DAX with respect to data mashup operations. DAX is a Data Analysis eXpressions language. M is designed specifically for data shaping.
* I should separate facts from dimensions. This is clear to me.
> Always. Even with the smallest of models. If you want to know why, please read this https://www.sqlbi.com/articles/understanding-dax-auto-exist/. The problem with auto-exist is that if you don't do it correctly, you'll be producing wrong numbers without even knowing it.
* I should bring all data in the fact tables down to the lowest grain as much as possible with PQ, not within DAX. This concept is still not totally clear to me. I always read that for star schema design, that fact tables with different granularity (for example Budgets vs Sales) should be in separate tables. But you are saying I should merge them. And so you did in the pbix, you merged A, B and C. Can you clarify this? What if A was Budget and B was Sales? Q2
> This depends. In your case, there's no need to have different granularities because you can calculate everything perfectly well from the lowest level up. Remember that you want to have as few tables as possible BUT NOT FEWER. It's like Einstein's: "Everything needs to be as simple as possible but not simpler." If you have a situation where there's a Sales table and a Budget one, then you want to have 2 different fact tables because you are modeling 2 different processes. And even then you probably want to do allocation so that both tables have the same grain and measures are simpler. But of course, you can have it calculated at the desired grain as well.
* What I didn't do before - but seems smart - is temporarily store a denominator column for every level in the hierarchy that I want to bring down to the lowest grain. Although I've made PQ functions before, the below code seems very alien and smart to me. Could you please explain how it works? Q3
> "Each" is a special syntactic sugar for functions. Exactly the same way "_" is. (r) below is the current row of the first argument under Table.AddColumn as the rows get iterated but then I have to have another function under Table.SelectRows that will pick up only those rows for which A Index is the same as the current row's A Index from the previous function Table.AddColumn. I have to have a means of referring to the correct object, hence (r) and _.
* I don't even see a division going on here. And I cannot find anything on bengribaudo.com on r[] = _[]. All I know is that _ means 'each'. Before I would have used Group By to get the Divisor column, but this seems much faster and simpler. Is it faster than Group By? Q4
> No idea if it's faster. Only testing can tell. Bear in mind, though, that grouping requires a lot of memory whereas the below is a kind of serial code. It's the same situation with a hash join and a nested-loop join in SQL. They both have their uses but one has to know when to use which to get most of it. Same with M.
* Just a small question, what do you mean with the word 'fait'? I'm not familiar with this. Q5
> I don't mean anything 🙂 I took the word from your file. You can do whatever you want with it 🙂 I might have made a typo as well. Don't dwell on this as it's of no meaning at all.
* I should keep dimension tables in separate tables, even though a hierarchy can be seen in them.
I see that in the pbix you merged the fact tables, but you kept the dimension tables apart. This does not make hierarchies for dimensions possible for as far as I know. Did you do this on purpose? Q6
> If you have a natural hierarchy, you can certainly put the attributes in one dimension. Yes, you can merge the 3 dims I created into one but then you have to make sure you create correct keys for the entries in the fact table, of course.
* I know that if I merge the dimensions it messes up the cardinality for the relationships. But I'm wondering how can I do something with hierarchies within Power BI if keeping dimensions separate makes this impossible - or do you see this as small sacrifice for the greater gain? Q7
> The grain must always be the lowest. There's no escape from this. You don't have to have separate dims as pointed out above.
* One of my biggest challenges is this below. And that's how I got my idea for fixing grain partly within DAX. Could you tell me if my reasoning below makes sense or where I should do something different? Q8
- There is an old application with and underlying database.
- The old application can and will not be developed anymore.
- The underlying database has no write possibilities.
We want to add more data to do more analyses, while the old application is still used to enter data.
My approach:
- I want to put the data from that underlying database in a datamodel in Power BI.
- I want to put heavy queries - those that do the merges, grain fixes, and restructuring of the data for a star schema - in Dataflows.
- I want to create a Dataset with these Dataflows loaded and I want users to be able to add data to this dataset with simple Excel files and later I'll replace them with PowerApps.
- I want that when a user adds a small piece of data, say a column or a table, that when the user clicks refresh, only that part in the datamodel is refreshed, and not the big underlying dataflows (in these dataflows I do the merges and grain fixes like you said).
However, the challenge is: this 'added data' will be added at a higher granularity through Excel. But in the Dataflows I already brought everything to the lowest granularity with your technique (or my old Group By technique). Therefore I was thinking of not merging the 'added tables' with the Dataflows (slow?), but keeping them as separate tables in the datamodel, and simply making relationships between the keys. When having the mixed model of big Dataflows and small 'added tables', I could then bring the grain of the 'added data' down to the lowest grain with DAX (my initial question in this topic). I think this would make refreshes much faster for that small portion of 'added data'. Do you think this is a good approach or would you advise me to merge every 'added data' with the Dataflows from the underlying database? (Basically extending on the work I did in the dataflows by also merging the 'added data' in a Dataset) Q9
> As for this last piece... I have experience with SSIS, not with Dataflows but I guess one day I'll have to dive into this as well. SSIS can be - if properly designed - extremely fast (it holds the world record of data processing speed) and I've never had issues with data loading or adding to some existing tables working in an enterprise-grade data warehouse environment. When you add a column to a table in the source system and then bring it into PowerBI, as much as I know you have to process the whole table anyway. There's no partial processing in this case. You do have partial processing when you add rows to partitions but that's not the case here. If data is added at a lower granularity, you don't have a choice really: you have to transform it so that it fits into the highest grain. Bear in mind, please, that "Lower granularity" means "fewer distinct values." I wouldn't do what you want to do in the last point. This will make your model unmanagable in no time if you start adding small tables all over the place and start creating relationships; this will violate the rule mentioned above (use the Ockham's Razor principle when building any solutions for anything). Please integrate your new data into the current structure and optimize Power Query or Dataflows for speed. This is certainly doable. Also, please stay away from calculating columns with DAX because such columns are not getting optimal compression and hence the calculations will slow down. If I were to do what you do, I would do all I could to integrate new data into the current model without creating any unnecessary tables. M is a great language and you can get great speeds from it. You just have to know how to code. I've given you some materials to go through (especially Chris Webb is good at this and you could even email him about any problems with M optimization). I know it's an upfront investment... but you'll be glad you did that in no time. Trust me.
Best
D
Once again thanks for the reply.
* This depends. In your case, there's no need to have different granularities because you can calculate everything perfectly well from the lowest level up. Remember that you want to have as few tables as possible BUT NOT FEWER. It's like Einstein's: "Everything needs to be as simple as possible but not simpler." If you have a situation where there's a Sales table and a Budget one, then you want to have 2 different fact tables because you are modeling 2 different processes. And even then you probably want to do allocation so that both tables have the same grain and measures are simpler. But of course, you can have it calculated at the desired grain as well.
> For me it's still hard to decide when processes can be merged or not. But I have a feeling that below reasoning might be valid?
* Bear in mind, please, that "Lower granularity" means "fewer distinct values."
> So...
Lower grain more detail, higher grain less detail.
Lower granularity less detail, higher granularity more detail.
Confusing haha.
* For me it's still hard to decide when processes can be merged or not. But I have a feeling that below reasoning might be valid?
> You can mix two different processes in one fact table only when they do share the lowest grain and at least most of their attributes. That should be obvious. But the fact that 2 processes share the same grain (say, they are being registered on a daily basis) does not mean you should keep them in one table. You probably should have separate ones. This all depends on factors like the ease of coding, maintainability(!), availability of data, processing speed, and cardinality of the tables/columns. And, of course, the speed of DAX. But generally, you should have separate fact tables for separate, well-defined processes. The smaller the fact table, the faster the code. The fewer filters you have to put in a DAX measure, the better. This is what should guide you. Users are not interested in how much space it takes to store the data they query. They are concerned with 2 things only: the speed of retrieval and the accuracy of the result. You should always have this in the back of you mind when you create models and code.
* Could I say that it's allright to merge processes, e.g. Budget and Sales when it's okay to forecast Budget to the lowest grain, but when the opposite is true - when it's not okay to forecast Budget to the lowest grain - then it's better to keep them apart, because only then users of the report know which level of data they can trust (in this case better to show data is not available at a certain grain). So when the reports have a forecast nature it's better to merge processes, and when data should be trusted 100% for availability and trustability it's better to keep them apart?
> Well, my take on this is that Sales and Budget should always be separate whatever the grain. For the reasons I've highlighted above. And it has nothing to do with trustability. Data must always be trustworthy. If this is not true, you've failed to deliver and I would sack you in no time 🙂 DAX lets you show measures when they need to be shown and hide them (display blanks) when it makes no sense to show them. It's like maths and physics. Maths extends far beyond its applications to the real world but to solve a real problem you only need some part of it. The one that you can give a sound interpretation to. The rest is there for... say, philosophical reasons, and exercising your brain (which is no less important than solving problems in the real world).
* And probably I have to add the following footnote? When data creation in separate processes happens in a predictive chronical order it's okay to merge the processes, but when data creation is at random order it's probably better to keep the tables apart?
> I don't fully get what you mean. I never separate data based on how it's being created/delivered. I separate data based on logic and technical requirements solely. The source does not matter in the slightest.
Best
D
Thanks for the explanation, but it's still not totally clear to me. Maybe a practical example is better to explain where my lack of understanding is.
Let's say there is a factory.
One process is the planning of the production of a product.
Product | Production Date | Production Location |
A | 1-1-2020 | L1 |
B | 2-1-2020 | L2 |
In an earlier process a project for production is calculated. Let's say there is 1 Project (can be more) it consists of Subprojects, each Subproject consists of 2 products.
Project | Subproject | Predicted Subproject Costs |
1 | 1 | 100 |
1 | 2 | 200 |
1 | 3 | 40 |
1 | 4 | 100 |
At this point I want to predict the Costs over Time accoring to the production planning. However this production planning is at Product level. So I'll simply divide the Predicted Subproject Costs by the amount of Products in each Subproject like this:
Project | Subproject | Product | Predicted Product Costs |
1 | 1 | A | 50 |
1 | 1 | B | 50 |
1 | 2 | C | 100 |
1 | 2 | D | 100 |
1 | 3 | E | 20 |
1 | 3 | F | 20 |
1 | 4 | G | 50 |
1 | 4 | H | 50 |
Then this project will go through detailed engineering and the final costs for each individual project can be determined. Let's say it would turn out to be something like this:
Project | Subproject | Product | Final Product Costs |
1 | 1 | A | 55 |
1 | 1 | B | 45 |
1 | 2 | C | 120 |
1 | 2 | D | 110 |
1 | 3 | E | 30 |
1 | 3 | F | 35 |
1 | 4 | G | 50 |
1 | 4 | H | 45 |
1 | 4 | I | 55 |
1 | 4 | J | 50 |
There will always be a mixed reality of:
Let's say there one wish is to show the Costs over Time (according to the production planning). Two questions:
Thanks for the explanation, but it's still not totally clear to me. Maybe a practical example is better to explain where my lack of understanding is.Let's say there is a factory.
One process is the planning of the production of a product.
Product Production Date Production Location A 1-1-2020 L1 B 2-1-2020 L2 In an earlier process a project for production is calculated. Let's say there is 1 Project (can be more) it consists of Subprojects, each Subproject consists of 2 products.
Project Subproject Predicted Subproject Costs 1 1 100 1 2 200 1 3 40 1 4 100 At this point I want to predict the Costs over Time accoring to the production planning. However this production planning is at Product level. So I'll simply divide the Predicted Subproject Costs by the amount of Products in each Subproject like this:
Project Subproject Product Predicted Product Costs 1 1 A 50 1 1 B 50 1 2 C 100 1 2 D 100 1 3 E 20 1 3 F 20 1 4 G 50 1 4 H 50 Then this project will go through detailed engineering and the final costs for each individual project can be determined. Let's say it would turn out to be something like this:
Project Subproject Product Final Product Costs 1 1 A 55 1 1 B 45 1 2 C 120 1 2 D 110 1 3 E 30 1 3 F 35 1 4 G 50 1 4 H 45 1 4 I 55 1 4 J 50 There will always be a mixed reality of:
- Products in the stage after Calculation: Predicted Product Costs
- Products in the stage after Engineering: Final Product Costs
Let's say there one wish is to show the Costs over Time (according to the production planning). Two questions:
- I don't see why I wouldn't merge the tables with Predicted Costs (Budget) and Final Costs (Sales) and simply add a Stage Column to identify whether that row is Predicted or Final. What would you do?
- Also I'm wondering if I should use 3 separate dimension tables for Project, Subproject and Product? Or one Product dimension table with a hierarchy?
Hi. Well, I would probably have a Product dimension on its own and a Project dimension (with subprojects as a column) and create a relationship between them: 1 project - many products, so that would be a snowflake dimension. However, there is a different way to model this. You could create a factless fact table that would only store the connections between SubprojectID and ProductID. By doing this you would eliminate the snowflake (which is what I would most likely do since I like pure star schemas). If you want to show a product's progress through different stages of production, then it's easy to have a table that for each product will keep track of the stages through time. You can certainly store the final product cost as well as the predicted cost in the Product table. This is because these amounts do not change over time, they're fixed to the product. But progressing through time requires keeping track of stages and some measurements that pertain to the steps. Hence the need for a different table. Of course, if all you know about a product is just its unique name... then you can store it in the table above. But as soon as you start adding attributes to products, it calls for a separate business entity and the factless fact table. Bear in mind that also Projects could store more information about individual projects and that would - again - immediately call for a separate dimension. So, in a word, in very simple models you could get away with very simple (if not simplistic) tables. But as soon as you start adding attributes to entities to describe them, this will start becoming awkward to code against and problematic for many other reasons I don't have time and space here to mention. Therefore, I always stick to the proper dimensional design. Even for the simplest of models. By the way, if you need a hierarchy and the fields are in different tables, you can always bring the needed fields as columns into the table of your choosing and create the hierarchy there. For instance, in the product table you could have 2 columns brought over from Projects: Project Name and Subproject Name. For this you could use - depending on the structure, of course - RELATED.
Best
D
These are such interesting questions.
It strikes me that the data model might become more workable if you thought of the different cost factors as "events" that happen to a product.
So you would have a dimension table that had columns:
Project
Sub-Project
Product
It looks like from your example those would be unique records and that a Product is not produced in more than one Project/subproject
Then your fact table would have columns
Product
Date
Cost Type: Predicted, Actual
Cost
I think you could do the calculations you need from those.
Depending on the reports you wanted you might construct a hierarchy in the Product dimension table
Help when you know. Ask when you don't!
I appreciate how confusing it is. If you'd like to do a screen share I'll walk you through changing your data to a Star Schema. Just send me an email with a good time to talk. ken@8thfold.com
Help when you know. Ask when you don't!
DAX provides a set of functions for creating parent/child hierarchies of arbitrary depth. Take a look at the pattern. https://www.daxpatterns.com/parent-child-hierarchies/ Once you have the hierarchy I think it will be easier to do the calculations you need.
Help when you know. Ask when you don't!
User | Count |
---|---|
15 | |
13 | |
8 | |
6 | |
6 |
User | Count |
---|---|
24 | |
19 | |
12 | |
9 | |
7 |