Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Greg_Deckler

When Best Practices Aren't

Best practices can be wonderful things when the rational behind them and their nuances are fully understood. However, applying best practices blindly can sometimes get you into trouble. For starters, best practices are most often general in nature and do not apply to every particular situation. In addition, best practices can sometimes conflict with one another. In this article, I explore two "best practices" that are often quoted within the Power BI community and demonstrate their nuances. Specifically, Roche's Maxim and, in particular, using SUMMARIZE to create calculated columns.

 

Roche's Maxim

Simply stated, Roche's Maxim states:

 

"Data should be transformed as far upstream as possible, and as far downstream as necessary."

 

Now, as best practices go, this is a pretty good one. However, there are still nuances. First, it's distinctly possible that a report developer might not have access to the source system to make changes. Getting the changes made at the source might take weeks or even months depending of the size of the organization. While it is certainly possible to get the transformations made at the source, in the interests of speed it might make more sense to make the data transformations in Power Query then rather than hold up the process.

 

But, even Power Query might not be the best answer either in all cases. For example, perhaps there is a median calculation that needs to occur. Median calculations tend to break query folding. So, one can imagine a scenario where adding the calculation in Power Query breaks query folding and actually makes scheduled refreshes less performant than if the calculation was just done in DAX, assuming that fewer ills are encountered when doing so.

 

Then there is the subject of competing best practices. Another best practice is that for maintainability and supportability reasons it's generally a best practice to write all code in as few languages as possible. If there are already going to be lots of DAX calculations within the report anyway and this is the only calculation that would occur in Power Query, then maybe a better path is to keep all of the calculations in a single coding language, DAX.

 

As I said, nuances. But, hopefully this illustrates some of the reasons why one cannot just go around applying best practices blindly without knowing the specific situation. Let's now turn our attention to adding columns in DAX SUMMARIZE expressions.

 

Adding Columns in SUMMARIZE

So here is one that I see all the times in the forums. Myself or someone else will answer a question where  SUMMARIZE is used and within that SUMMARIZE expression is the addition of a column, like so:

Measure =
  SUMX(
    SUMMARIZE(
      'Sales',
      'Product'[Color],
      "Sales", SUM('Sales'[SalesAmount])
    ),
    [Sales]
  )

Quite frequently, some individual will come along and quote a "best practice" from this article that says that adding columns within a SUMMARIZE is always bad because it can lead to unexpected results. Not wrong results mind you,  but unexpected.

 

But let's dig into this "best practice". First, unlike Roche's Maxim, which I would feel confident in saying is a "generally accepted best practice". This "best practice" with SUMMARIZE is only something I've seen espoused by an extremely limited number of people. It's certainly not part of any official documentation on SUMMARIZE and I've seen nothing from Microsoft that states this as a best practice.

 

Futhermore, the proposed alternatives either don't really work or are cumbersome and kind of nonsensical in nature. For example, SUMMARIZECOLUMNS pretty much doesn't work in anything outside of a Card visual. Create a table or matrix and add any column and the thing up and dies with the error: 'SummarizeColumns() and AddMissingItems() may not be used in this context.' Not useful.

 

The other alternative results in a kind of bizarre construct. Essentially to write the measure using "best practice" would be:

Measure =
  SUMX(
    ADDCOLUMNS(
      SUMMARIZE(
        'Sales',
        'Product'[Color]
      )
      "Sales", CALCULATE(SUM('Sales'[SalesAmount]))
    ),
    [Sales]
  )

Yep, a seemingly random CALCULATE with no filter clause. And yes, we all know that this is because of how CALCULATE translates row context to filter context, etc. But intuitive? Not in the slightest. There's an even more unintuitive version using GENERATE and no ADDCOLUMNS but I think the point has been made.

 

But here's the thing. If you actually read the entire article cited above regarding this "best practice", you actually realize that the unexpected results only occur when using CALCULATE within the added column within the SUMMARIZE. This is because of the way that CALCULATE's context switching and the clusters created by SUMMARIZE interact with one another.

 

The solution? Simple. Don't use CALCULATE when adding columns within a SUMMARIZE expression. Duh. Why isn't that the "best practice"? Unless someone can come to me with a concrete example where not using CALCULATE when adding a column within a SUMMARIZE expression returns a wonky, wrong or unexpected result, I just don't see the point of throwing the baby out with the bath water.

 

Conclusion

Knowing about best practices is important. Knowing the nuances, rational and when to apply best practices is more important. Knowing the difference between a best practice and an opinion is perhaps the most important.

 

Comments

Interesting stuff - thanks for writing!

I'm still learning DAX, but in this example:

Measure =
  SUMX(
    ADDCOLUMNS(
      SUMMARIZE(
        'Sales',
        'Product'[Color]
      )
      "Sales", CALCULATE(SUM('Sales'[SalesAmount]))
    ),
    [Sales]
  )

Would you still require the calculate if the sum of SalesAmount was already in a measure (i.e. calling a calcuate behind the scenes)?


@robinuk Not sure why you would need the CALCULATE there. Seems superfluous. Also, if you have a measure that is just SUM('Sales'[Amount]) then you wouldn't need the CALCULATE either.