cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
scriptpup
Frequent Visitor

DAX Filtering calculation by first in partitioned group - Ways to optimize?

This is a somewhat complex situation, but the basic need is pretty simple. As I said in the title, I'm looking to return a count based on the filtered partitions. 

 

IN SQL it looks like this:

SELECT
	[Tier 1]
	,COUNT(1) "Tasks"
FROM (
	SELECT
		tsk.[Tier 1]
		,ROW_NUMBER() OVER (PARTITION BY Case_ID ORDER BY CRT_DTS DESC) rw
	FROM someTasks tsk
) partitioned 
WHERE
	rw = 1
GROUP BY
	[Tier 1]

Then if I want to change the date-range to show what the 'Latest' task tier was as of date yyyy-mm-dd, then I just add a WHERE in the inner query, simple.

 

In DAX I came up with the following:

SUMMARIZE (
        Task,
        Task[Tier 1],
        "Profiled", CALCULATE (
            COUNTROWS ( Task ),
            TREATAS (
                SELECTCOLUMNS (
                    SUMMARIZE (
                        Task,
                        Task[CASE_ID],
                        "MaxTask", CALCULATE (
                            MAX ( Task[TASK_ID] ),
                            FILTER ( Task, Task[CRT_DTS] = MAX ( Task[CRT_DTS] ) )
                        )
                    ),
                    "MaxTask", [MaxTask]
                ),
                Task[TASK_ID]
            )
        ),
        "Tasks", COUNTROWS ( Task )
    )

This works, but is extremely slow (20 seconds to run in DAX studio) and seems inefficient, even to someone as inexperienced with DAX as I am.

 

Is there a better, more optimized, way to do this? 

 

5 REPLIES 5
ImkeF
Super User
Super User

Hi @scriptpup 

As a first step, please use ADDCOLUMNS for the column to add. Use SUMMARIZE only for the grouping: https://www.sqlbi.com/blog/marco/2012/09/04/optimize-summarize-with-addcolumns-in-dax-ssas-tabular-d... 

 

Haven't studies the logic closely, but nested iterators can be a bit slow. 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thanks @ImkeF, I changed the function to look like this:

 

 

EVALUATE
SUMMARIZE (
        Task,
        Task[Tier 1],
        "Profiled", CALCULATE (
            COUNTROWS ( Task ),
            TREATAS (
                SELECTCOLUMNS (
                    ADDCOLUMNS(SUMMARIZE (
                        Task,
                        Task[CASE_ID]),
                        "MaxTask", CALCULATE (
                            MAX ( Task[TASK_ID] ),
                            FILTER ( Task, Task[CRT_DTS] = MAX ( Task[CRT_DTS] ) )
                        )
                    ),
                    "MaxTask", [MaxTask]
                ),
                Task[TASK_ID]
            )
        ),
        "Tasks", COUNTROWS ( Task )
    )

 

But I don't see any significant performance gain (Still takes on-average around 20 seconds)

 

 -- Edit, it's also now returning '1' for every row, so it's not doing the same thing/working correctly with ADDCOLUMNS, either.

Hi @scriptpup

could you share some sample data please so I can understand what you're trying to achieve?

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

 

@ImkeF 

It's somewhat difficult to provide sample data since everything I'm working with right now is classified... But as an example, let's say we have these tables:

 

1) Base

     Case_Id
      , ... Lots of unimportant columns

2) Task

      Task_Id

       ,Case_Id

       ,Task_Create_Date

       ,Tier 1

 

Both of these tables will contain transactional data, one record per Task in the Task table and one record per case for the Base table.

 

I have a relationship between 1 and 2 on Case_Id. My desired end table would look like 

Tier 1TasksProfiled
Some_Category800300
Some_OtherCategory5545
Some_Final_Category12966
Etc.930699

 

The profiled column will show ONLY the count of the 'Most recent' tasks associated with the case, using the Task_Create_Date column as the indicator of what is most recent.

 

I hope this helps. Thanks!

Hi @scriptpup 

And what role does the "Tier" play?

How about just creating some mockup data that return the result from the sample you've given?

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Community News

Fabric Community News unified experience

Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors