Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
scriptpup
Frequent Visitor

DAX Filtering calculation by first in partitioned group - Ways to optimize?

This is a somewhat complex situation, but the basic need is pretty simple. As I said in the title, I'm looking to return a count based on the filtered partitions. 

 

IN SQL it looks like this:

SELECT
	[Tier 1]
	,COUNT(1) "Tasks"
FROM (
	SELECT
		tsk.[Tier 1]
		,ROW_NUMBER() OVER (PARTITION BY Case_ID ORDER BY CRT_DTS DESC) rw
	FROM someTasks tsk
) partitioned 
WHERE
	rw = 1
GROUP BY
	[Tier 1]

Then if I want to change the date-range to show what the 'Latest' task tier was as of date yyyy-mm-dd, then I just add a WHERE in the inner query, simple.

 

In DAX I came up with the following:

SUMMARIZE (
        Task,
        Task[Tier 1],
        "Profiled", CALCULATE (
            COUNTROWS ( Task ),
            TREATAS (
                SELECTCOLUMNS (
                    SUMMARIZE (
                        Task,
                        Task[CASE_ID],
                        "MaxTask", CALCULATE (
                            MAX ( Task[TASK_ID] ),
                            FILTER ( Task, Task[CRT_DTS] = MAX ( Task[CRT_DTS] ) )
                        )
                    ),
                    "MaxTask", [MaxTask]
                ),
                Task[TASK_ID]
            )
        ),
        "Tasks", COUNTROWS ( Task )
    )

This works, but is extremely slow (20 seconds to run in DAX studio) and seems inefficient, even to someone as inexperienced with DAX as I am.

 

Is there a better, more optimized, way to do this? 

 

5 REPLIES 5
ImkeF
Super User
Super User

Hi @scriptpup 

As a first step, please use ADDCOLUMNS for the column to add. Use SUMMARIZE only for the grouping: https://www.sqlbi.com/blog/marco/2012/09/04/optimize-summarize-with-addcolumns-in-dax-ssas-tabular-d... 

 

Haven't studies the logic closely, but nested iterators can be a bit slow. 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thanks @ImkeF, I changed the function to look like this:

 

 

EVALUATE
SUMMARIZE (
        Task,
        Task[Tier 1],
        "Profiled", CALCULATE (
            COUNTROWS ( Task ),
            TREATAS (
                SELECTCOLUMNS (
                    ADDCOLUMNS(SUMMARIZE (
                        Task,
                        Task[CASE_ID]),
                        "MaxTask", CALCULATE (
                            MAX ( Task[TASK_ID] ),
                            FILTER ( Task, Task[CRT_DTS] = MAX ( Task[CRT_DTS] ) )
                        )
                    ),
                    "MaxTask", [MaxTask]
                ),
                Task[TASK_ID]
            )
        ),
        "Tasks", COUNTROWS ( Task )
    )

 

But I don't see any significant performance gain (Still takes on-average around 20 seconds)

 

 -- Edit, it's also now returning '1' for every row, so it's not doing the same thing/working correctly with ADDCOLUMNS, either.

Hi @scriptpup

could you share some sample data please so I can understand what you're trying to achieve?

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

 

@ImkeF 

It's somewhat difficult to provide sample data since everything I'm working with right now is classified... But as an example, let's say we have these tables:

 

1) Base

     Case_Id
      , ... Lots of unimportant columns

2) Task

      Task_Id

       ,Case_Id

       ,Task_Create_Date

       ,Tier 1

 

Both of these tables will contain transactional data, one record per Task in the Task table and one record per case for the Base table.

 

I have a relationship between 1 and 2 on Case_Id. My desired end table would look like 

Tier 1TasksProfiled
Some_Category800300
Some_OtherCategory5545
Some_Final_Category12966
Etc.930699

 

The profiled column will show ONLY the count of the 'Most recent' tasks associated with the case, using the Task_Create_Date column as the indicator of what is most recent.

 

I hope this helps. Thanks!

Hi @scriptpup 

And what role does the "Tier" play?

How about just creating some mockup data that return the result from the sample you've given?

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors