Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
chipchidster
Resolver I
Resolver I

MIN in GroupBy not working as expected in SUMMARIZE

I'm sure I am missing something very basic here but.....

 

I have two tables (history & history_sprint_summary).  History stores all status changes for work items (stories) which relevant timestamps, etc.  History sprint summary is a summary table whose purpose is to show the first time a story went into each status.  This then drives reporting to show how much work a squad has done - give a particular end status.  I use a DAX SUMMARIZE expression to build the summary table:

 

history_sprint_summary = SUMMARIZE (
filter(history, history[sprint_to_date] <> BLANK()),
history[issue_key],
history[story_points],
issues[project_key],
issues[issue_id],
issue_types[type],
history[assignee],
history[to_status],
"from_date", min( history[from_date] ),
"max_sprint_date", min( history[sprint_to_date]))

This has worked fine until recently, when a member of one of my squads accidently pushed a load of stories back through the workflow.  Now i am seeing duplicates in my summary table - my expectation is that the from_date and max_sprint_date would ensure i only get 1 row per status, per issue - so essentially i see the first time a story went into dev done.  I have pasted the data for one of the affected stories below:

 

issue_keymax_sprint_datestory_pointstypeproject_keyissue_idto_statusfrom_date
PROMO-10302/12/2019 23:595StoryPROMO227916in dev19/11/2019 11:52
PROMO-10302/12/2019 23:595StoryPROMO227916ready for dev14/11/2019 17:10
PROMO-10302/12/2019 23:595StoryPROMO227916peer review20/11/2019 09:48
PROMO-10302/12/2019 23:595StoryPROMO227916qa27/11/2019 12:21
PROMO-10302/12/2019 23:595StoryPROMO227916release test/uat28/11/2019 10:05
PROMO-10301/06/2020 23:595StoryPROMO227916release test/uat29/11/2019 12:57
PROMO-10329/06/2020 23:595StoryPROMO227916dev done29/05/2020 10:20
PROMO-10302/12/2019 23:595StoryPROMO227916dev done29/11/2019 11:21
PROMO-10329/06/2020 23:595StoryPROMO227916ready for live16/06/2020 09:02
PROMO-10329/06/2020 23:595StoryPROMO227916live proving

16/06/2020 09:05

 

I have previously tested this scenario - but only with dates that were a few weeks prior.  The only thing i can think of is that the dates for dev done are in different years - but it feels somewhat flaky if MIN can't handle year boundaries.  The max_sprint_date is a date hierarchy.  

 

Any thoughts would be greatful appreciated.

Thanks

 

 

1 ACCEPTED SOLUTION

Okay - so it would appear that i was being stupid (as i suspected i was).

 

The reason this was giving duplicates was because i had an assignee column in the summarize statement.  In this instance, two different assignees had moved the issue through the dev done status, and as such, summarize was counting those as unique instances.  Removed the assignee col, and hey presto - now i get one row per status.

View solution in original post

5 REPLIES 5
mahoneypat
Microsoft Employee
Microsoft Employee

Please see if this variation on your table expression gets rid of your duplicate rows.  I assume since you are doing a MIN that you are getting the right DateTimes in your table (but just had duplicate rows of that).

 

history_sprint_summary =
ADDCOLUMNS (
    DISTINCT (
        SUMMARIZE (
            FILTER ( history, history[sprint_to_date] <> BLANK () ),
            history[issue_key],
            history[story_points],
            issues[project_key],
            issues[issue_id],
            issue_types[type],
            history[assignee],
            history[to_status]
        )
    ),
    "from_date", CALCULATE ( MIN ( history[from_date] ) ),
    "max_sprint_date", CALCULATE ( MIN ( history[sprint_to_date] ) )
)

 

If this works for you, please mark it as the solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Hi Pat, 

Thanks for the quick response.  Sadly that didn't make a different.  If you look at the example data in the post - you should be able to see the two rows where the to_status is dev done.  I want to make it so that I only have one instance of dev done in the table for that issue - with the earlier date of 02/12/19.

Thanks

Dan

Can you explain the issues and issue_type tables and how they relate to your history table?  

 

Regards,

Pat]





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Hi Pat

The datmaset is an extract from Jira (our work management system).  The tables are organised as follows:

issues - contains 1 record for each work item

issue_type - static table that contains the label value for issue types (e.g. story, spike, bug, etc)

history - contains a record for every status change to an issue (linked to issue by issue_id

history_sprint_summary - should contain one row for the first time an issue changes to a given status

 

The primary key being used is the issue_id which is a GUID for this dataset.  In the summarize statement above - i am pulling in the additional fields to support easier report creation.

Okay - so it would appear that i was being stupid (as i suspected i was).

 

The reason this was giving duplicates was because i had an assignee column in the summarize statement.  In this instance, two different assignees had moved the issue through the dev done status, and as such, summarize was counting those as unique instances.  Removed the assignee col, and hey presto - now i get one row per status.

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.

Top Solution Authors