Solved: Re: MIN in GroupBy not working as expected in SUMM...

chipchidster · ‎07-04-2020

I'm sure I am missing something very basic here but.....

I have two tables (history & history_sprint_summary). History stores all status changes for work items (stories) which relevant timestamps, etc. History sprint summary is a summary table whose purpose is to show the first time a story went into each status. This then drives reporting to show how much work a squad has done - give a particular end status. I use a DAX SUMMARIZE expression to build the summary table:

history_sprint_summary = SUMMARIZE (
 filter(history, history[sprint_to_date] <> BLANK()),
 history[issue_key],
 history[story_points],
 issues[project_key],
 issues[issue_id],
 issue_types[type],
 history[assignee],
 history[to_status],
 "from_date", min( history[from_date] ),
 "max_sprint_date", min( history[sprint_to_date]))

This has worked fine until recently, when a member of one of my squads accidently pushed a load of stories back through the workflow. Now i am seeing duplicates in my summary table - my expectation is that the from_date and max_sprint_date would ensure i only get 1 row per status, per issue - so essentially i see the first time a story went into dev done. I have pasted the data for one of the affected stories below:

issue_key	max_sprint_date	story_points	type	project_key	issue_id	to_status	from_date
PROMO-103	02/12/2019 23:59	5	Story	PROMO	227916	in dev	19/11/2019 11:52
PROMO-103	02/12/2019 23:59	5	Story	PROMO	227916	ready for dev	14/11/2019 17:10
PROMO-103	02/12/2019 23:59	5	Story	PROMO	227916	peer review	20/11/2019 09:48
PROMO-103	02/12/2019 23:59	5	Story	PROMO	227916	qa	27/11/2019 12:21
PROMO-103	02/12/2019 23:59	5	Story	PROMO	227916	release test/uat	28/11/2019 10:05
PROMO-103	01/06/2020 23:59	5	Story	PROMO	227916	release test/uat	29/11/2019 12:57
PROMO-103	29/06/2020 23:59	5	Story	PROMO	227916	dev done	29/05/2020 10:20
PROMO-103	02/12/2019 23:59	5	Story	PROMO	227916	dev done	29/11/2019 11:21
PROMO-103	29/06/2020 23:59	5	Story	PROMO	227916	ready for live	16/06/2020 09:02
PROMO-103	29/06/2020 23:59	5	Story	PROMO	227916	live proving	16/06/2020 09:05

I have previously tested this scenario - but only with dates that were a few weeks prior. The only thing i can think of is that the dates for dev done are in different years - but it feels somewhat flaky if MIN can't handle year boundaries. The max_sprint_date is a date hierarchy.

Any thoughts would be greatful appreciated.

Thanks

chipchidster · ‎07-08-2020

Okay - so it would appear that i was being stupid (as i suspected i was).

The reason this was giving duplicates was because i had an assignee column in the summarize statement. In this instance, two different assignees had moved the issue through the dev done status, and as such, summarize was counting those as unique instances. Removed the assignee col, and hey presto - now i get one row per status.

View solution in original post

mahoneypat · ‎07-04-2020

Please see if this variation on your table expression gets rid of your duplicate rows. I assume since you are doing a MIN that you are getting the right DateTimes in your table (but just had duplicate rows of that).

history_sprint_summary =
ADDCOLUMNS (
    DISTINCT (
        SUMMARIZE (
            FILTER ( history, history[sprint_to_date] <> BLANK () ),
            history[issue_key],
            history[story_points],
            issues[project_key],
            issues[issue_id],
            issue_types[type],
            history[assignee],
            history[to_status]
        )
    ),
    "from_date", CALCULATE ( MIN ( history[from_date] ) ),
    "max_sprint_date", CALCULATE ( MIN ( history[sprint_to_date] ) )
)

If this works for you, please mark it as the solution. Kudos are appreciated too. Please let me know if not.

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

chipchidster · ‎07-04-2020

Hi Pat,

Thanks for the quick response. Sadly that didn't make a different. If you look at the example data in the post - you should be able to see the two rows where the to_status is dev done. I want to make it so that I only have one instance of dev done in the table for that issue - with the earlier date of 02/12/19.

Thanks

Dan

mahoneypat · ‎07-05-2020

Can you explain the issues and issue_type tables and how they relate to your history table?

Regards,

Pat]

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

chipchidster · ‎07-06-2020

Hi Pat

The datmaset is an extract from Jira (our work management system). The tables are organised as follows:

issues - contains 1 record for each work item

issue_type - static table that contains the label value for issue types (e.g. story, spike, bug, etc)

history - contains a record for every status change to an issue (linked to issue by issue_id

history_sprint_summary - should contain one row for the first time an issue changes to a given status

The primary key being used is the issue_id which is a GUID for this dataset. In the summarize statement above - i am pulling in the additional fields to support easier report creation.

chipchidster · ‎07-08-2020

Okay - so it would appear that i was being stupid (as i suspected i was).

The reason this was giving duplicates was because i had an assignee column in the summarize statement. In this instance, two different assignees had moved the issue through the dev done status, and as such, summarize was counting those as unique instances. Removed the assignee col, and hey presto - now i get one row per status.

MIN in GroupBy not working as expected in SUMMARIZE

Helpful resources

Join us at the Microsoft Fabric Community Conference

We want your feedback!

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024

Join us at the 2025 Microsoft Fabric Community Conference