DAX Filtering calculation by first in partitioned ...

scriptpup · ‎01-23-2020

This is a somewhat complex situation, but the basic need is pretty simple. As I said in the title, I'm looking to return a count based on the filtered partitions.

IN SQL it looks like this:

SELECT
	[Tier 1]
	,COUNT(1) "Tasks"
FROM (
	SELECT
		tsk.[Tier 1]
		,ROW_NUMBER() OVER (PARTITION BY Case_ID ORDER BY CRT_DTS DESC) rw
	FROM someTasks tsk
) partitioned 
WHERE
	rw = 1
GROUP BY
	[Tier 1]

Then if I want to change the date-range to show what the 'Latest' task tier was as of date yyyy-mm-dd, then I just add a WHERE in the inner query, simple.

In DAX I came up with the following:

SUMMARIZE (
        Task,
        Task[Tier 1],
        "Profiled", CALCULATE (
            COUNTROWS ( Task ),
            TREATAS (
                SELECTCOLUMNS (
                    SUMMARIZE (
                        Task,
                        Task[CASE_ID],
                        "MaxTask", CALCULATE (
                            MAX ( Task[TASK_ID] ),
                            FILTER ( Task, Task[CRT_DTS] = MAX ( Task[CRT_DTS] ) )
                        )
                    ),
                    "MaxTask", [MaxTask]
                ),
                Task[TASK_ID]
            )
        ),
        "Tasks", COUNTROWS ( Task )
    )

This works, but is extremely slow (20 seconds to run in DAX studio) and seems inefficient, even to someone as inexperienced with DAX as I am.

Is there a better, more optimized, way to do this?

ImkeF · ‎01-23-2020

Hi @scriptpup

As a first step, please use ADDCOLUMNS for the column to add. Use SUMMARIZE only for the grouping: https://www.sqlbi.com/blog/marco/2012/09/04/optimize-summarize-with-addcolumns-in-dax-ssas-tabular-d...

Haven't studies the logic closely, but nested iterators can be a bit slow.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

scriptpup · ‎01-23-2020

Thanks @ImkeF, I changed the function to look like this:

EVALUATE
SUMMARIZE (
        Task,
        Task[Tier 1],
        "Profiled", CALCULATE (
            COUNTROWS ( Task ),
            TREATAS (
                SELECTCOLUMNS (
                    ADDCOLUMNS(SUMMARIZE (
                        Task,
                        Task[CASE_ID]),
                        "MaxTask", CALCULATE (
                            MAX ( Task[TASK_ID] ),
                            FILTER ( Task, Task[CRT_DTS] = MAX ( Task[CRT_DTS] ) )
                        )
                    ),
                    "MaxTask", [MaxTask]
                ),
                Task[TASK_ID]
            )
        ),
        "Tasks", COUNTROWS ( Task )
    )

But I don't see any significant performance gain (Still takes on-average around 20 seconds)

-- Edit, it's also now returning '1' for every row, so it's not doing the same thing/working correctly with ADDCOLUMNS, either.

ImkeF · ‎01-23-2020

Hi @scriptpup ,

could you share some sample data please so I can understand what you're trying to achieve?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

scriptpup · ‎01-27-2020

@ImkeF

It's somewhat difficult to provide sample data since everything I'm working with right now is classified... But as an example, let's say we have these tables:

1) Base

Case_Id
, ... Lots of unimportant columns

2) Task

Task_Id

,Case_Id

,Task_Create_Date

,Tier 1

Both of these tables will contain transactional data, one record per Task in the Task table and one record per case for the Base table.

I have a relationship between 1 and 2 on Case_Id. My desired end table would look like

Tier 1	Tasks	Profiled
Some_Category	800	300
Some_OtherCategory	55	45
Some_Final_Category	129	66
Etc.	930	699

The profiled column will show ONLY the count of the 'Most recent' tasks associated with the case, using the Task_Create_Date column as the indicator of what is most recent.

I hope this helps. Thanks!