March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I'm sure I am missing something very basic here but.....
I have two tables (history & history_sprint_summary). History stores all status changes for work items (stories) which relevant timestamps, etc. History sprint summary is a summary table whose purpose is to show the first time a story went into each status. This then drives reporting to show how much work a squad has done - give a particular end status. I use a DAX SUMMARIZE expression to build the summary table:
history_sprint_summary = SUMMARIZE (
filter(history, history[sprint_to_date] <> BLANK()),
history[issue_key],
history[story_points],
issues[project_key],
issues[issue_id],
issue_types[type],
history[assignee],
history[to_status],
"from_date", min( history[from_date] ),
"max_sprint_date", min( history[sprint_to_date]))
This has worked fine until recently, when a member of one of my squads accidently pushed a load of stories back through the workflow. Now i am seeing duplicates in my summary table - my expectation is that the from_date and max_sprint_date would ensure i only get 1 row per status, per issue - so essentially i see the first time a story went into dev done. I have pasted the data for one of the affected stories below:
issue_key | max_sprint_date | story_points | type | project_key | issue_id | to_status | from_date |
PROMO-103 | 02/12/2019 23:59 | 5 | Story | PROMO | 227916 | in dev | 19/11/2019 11:52 |
PROMO-103 | 02/12/2019 23:59 | 5 | Story | PROMO | 227916 | ready for dev | 14/11/2019 17:10 |
PROMO-103 | 02/12/2019 23:59 | 5 | Story | PROMO | 227916 | peer review | 20/11/2019 09:48 |
PROMO-103 | 02/12/2019 23:59 | 5 | Story | PROMO | 227916 | qa | 27/11/2019 12:21 |
PROMO-103 | 02/12/2019 23:59 | 5 | Story | PROMO | 227916 | release test/uat | 28/11/2019 10:05 |
PROMO-103 | 01/06/2020 23:59 | 5 | Story | PROMO | 227916 | release test/uat | 29/11/2019 12:57 |
PROMO-103 | 29/06/2020 23:59 | 5 | Story | PROMO | 227916 | dev done | 29/05/2020 10:20 |
PROMO-103 | 02/12/2019 23:59 | 5 | Story | PROMO | 227916 | dev done | 29/11/2019 11:21 |
PROMO-103 | 29/06/2020 23:59 | 5 | Story | PROMO | 227916 | ready for live | 16/06/2020 09:02 |
PROMO-103 | 29/06/2020 23:59 | 5 | Story | PROMO | 227916 | live proving | 16/06/2020 09:05 |
I have previously tested this scenario - but only with dates that were a few weeks prior. The only thing i can think of is that the dates for dev done are in different years - but it feels somewhat flaky if MIN can't handle year boundaries. The max_sprint_date is a date hierarchy.
Any thoughts would be greatful appreciated.
Thanks
Solved! Go to Solution.
Okay - so it would appear that i was being stupid (as i suspected i was).
The reason this was giving duplicates was because i had an assignee column in the summarize statement. In this instance, two different assignees had moved the issue through the dev done status, and as such, summarize was counting those as unique instances. Removed the assignee col, and hey presto - now i get one row per status.
Please see if this variation on your table expression gets rid of your duplicate rows. I assume since you are doing a MIN that you are getting the right DateTimes in your table (but just had duplicate rows of that).
history_sprint_summary =
ADDCOLUMNS (
DISTINCT (
SUMMARIZE (
FILTER ( history, history[sprint_to_date] <> BLANK () ),
history[issue_key],
history[story_points],
issues[project_key],
issues[issue_id],
issue_types[type],
history[assignee],
history[to_status]
)
),
"from_date", CALCULATE ( MIN ( history[from_date] ) ),
"max_sprint_date", CALCULATE ( MIN ( history[sprint_to_date] ) )
)
If this works for you, please mark it as the solution. Kudos are appreciated too. Please let me know if not.
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Hi Pat,
Thanks for the quick response. Sadly that didn't make a different. If you look at the example data in the post - you should be able to see the two rows where the to_status is dev done. I want to make it so that I only have one instance of dev done in the table for that issue - with the earlier date of 02/12/19.
Thanks
Dan
Can you explain the issues and issue_type tables and how they relate to your history table?
Regards,
Pat]
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Hi Pat
The datmaset is an extract from Jira (our work management system). The tables are organised as follows:
issues - contains 1 record for each work item
issue_type - static table that contains the label value for issue types (e.g. story, spike, bug, etc)
history - contains a record for every status change to an issue (linked to issue by issue_id
history_sprint_summary - should contain one row for the first time an issue changes to a given status
The primary key being used is the issue_id which is a GUID for this dataset. In the summarize statement above - i am pulling in the additional fields to support easier report creation.
Okay - so it would appear that i was being stupid (as i suspected i was).
The reason this was giving duplicates was because i had an assignee column in the summarize statement. In this instance, two different assignees had moved the issue through the dev done status, and as such, summarize was counting those as unique instances. Removed the assignee col, and hey presto - now i get one row per status.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
23 | |
16 | |
12 | |
9 | |
7 |
User | Count |
---|---|
38 | |
32 | |
28 | |
12 | |
11 |