Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
ChandeepChhabra
Impactful Individual
Impactful Individual

Grouping Problem

I am not sure where am I making a mistake. PQ (M) geniuses please help!

 

I have this 2 columnar table (Dates are in dd-mm)

WinLoss  Date

Win01-01-2024
Loss01-01-2024
Loss02-01-2024
Loss03-01-2024
Loss04-01-2024
Win03-01-2024
Win04-01-2024
Win05-01-2024
Win05-01-2024
Win05-01-2024

 

I want to group this data and add a 3rd column to this table as Streak Count. The logic is as follows:

  1. A streak gets counted only if the win/loss is on a consecutive day.
  2. If the next win/loss is on the same day it is not considered as a streak.

 Here is the expected output.

WinLossDate Streak
Win01-Jan-24 1
Loss01-Jan-24 4
Win03-Jan-24 3
Win05-Jan-24 1
Win05-Jan-24 1

 

I only want to solve this using Table.Group function's 5th parameter. Here is my code so far and it doens't seem to work.

 

let
 Source = Excel.CurrentWorkbook(){[Name="WinLossWDates"]}[Content],
 DataTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"WinLoss", type text}}),
 Group = Table.Group(
  DataTypes,
  {"WinLoss", "Date"}, {"Streak", each _}, GroupKind.Local,
  (cr, nr) => if Duration.Days(nr[Date] - cr[Date]) = 1 and cr[WinLoss] = nr[WinLoss] then 0 else 1
)
in
 Group

 

By the way : the new chat gpt sucks!

 

Thank you

C

2 ACCEPTED SOLUTIONS
AlienSx
Super User
Super User

Hello, @ChandeepChhabra I don't think you'll solve this with Table.Group and it's 5th agrument alone. Consider index column to control number of rows in your current group.

let
    Source = Excel.CurrentWorkbook(){[Name="WinLossWDates"]}[Content],
    idx = Table.AddIndexColumn(Source, "idx"),
    group = Table.Group(
        idx, 
        {"WinLoss", "Date", "idx"}, 
        {"Streak", Table.RowCount}, 
        GroupKind.Local, 
        (s, c) => Number.From(
            s[WinLoss] <> c[WinLoss] or 
            Duration.Days(c[Date] - s[Date]) <> (c[idx] - s[idx])
        )
    )
in
    group

 

View solution in original post

Lets start with first 2 rows: 

sc01.png

We start with

State = [Winloss = "Win", Date = #date(2024, 01, 01)]

and we compare this record to the very next record

Current =  [Winloss = "Loss", Date = #date(2024, 01, 01)]

Do we need to start new group here? YES, because State[Winloss] <> Current[Winloss].

And we start new group with 

State = [WinLoss = "Loss", Date = #date(2024, 01, 01)] and check for the very next record which is 

Current = [WinLoss = "Loss", Date = #date(2024, 02, 01)]. It's "Current (1)" over here:

sc03.png

 

 Do we need to start new group at Current (1)? NO, because State[Winloss] = Current[Winloss] and duration between Current[Date] and State[Date] is 1 day exactly. We keep grouping so that "State" stays the same! It always stays the same withing the group! 

Next step: we stay in the same group with the same "State" but the very next "Current" becomes our next record

Current (2) = [WinLoss = "Loss", Date = #date(2024, 03, 01)]. Again, "State" is the same as we are in the same group. We need to make a decision: start new group or not? WinLoss field is same but what about Date? At this point we have State[Date] (which is Jan 1st) and Current[Date] (which is Jan 3rd). We don't have info about previous row with Date = Jan 2nd anymore but we need to make a decision.

That's why we need some helper column in the form of index to calculate something like "running total" between "State" and "Current" to compare it to duration in days in this particular case. 

The key is that the only information we have at each stage of iteration is just the very first ("state") row/record and current row. Iterator moves on. So that at each step the arguments of our desicion making function (5th argument) are 2 rows/records: the very first one and the current one. Decision is always the same: do we need to start new group or stay in the current group with the same "State" record and next "Current" record. 

View solution in original post

8 REPLIES 8
v-junyant-msft
Community Support
Community Support

Hi @ChandeepChhabra ,

Have you solved your problem?
If solved please mark the reply in this post which you think is helpful as a solution to help more others facing the same problem to find a solution quickly, thank you very much!

Best Regards,
Dino Tao

Omid_Motamedise
Impactful Individual
Impactful Individual

You can use this

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    DataTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"WinLoss", type text}}),
    #"Added Index" = Table.AddIndexColumn(DataTypes, "Index", 0, 1, Int64.Type),
 Group = Table.Group(
  #"Added Index",
  {"WinLoss", "Date","Index"}, {"Streak", each _}, GroupKind.Local,
  (cr, nr) => if Duration.Days(nr[Date] - cr[Date]) =(nr[Index] - cr[Index])  and cr[WinLoss] = nr[WinLoss] then 0 else 1
)
in
 Group
AlienSx
Super User
Super User

Hello, @ChandeepChhabra I don't think you'll solve this with Table.Group and it's 5th agrument alone. Consider index column to control number of rows in your current group.

let
    Source = Excel.CurrentWorkbook(){[Name="WinLossWDates"]}[Content],
    idx = Table.AddIndexColumn(Source, "idx"),
    group = Table.Group(
        idx, 
        {"WinLoss", "Date", "idx"}, 
        {"Streak", Table.RowCount}, 
        GroupKind.Local, 
        (s, c) => Number.From(
            s[WinLoss] <> c[WinLoss] or 
            Duration.Days(c[Date] - s[Date]) <> (c[idx] - s[idx])
        )
    )
in
    group

 

I understand the solution but can you explain the reason for adding an index and why can't it work without the index? 

Basics: 5th argument of Table.Group is a function of 2 arguments. I usually call them "s" (state) and "c" (current). "State" is first record (table row) in the group of records and consists of fields listed in 2nd argument. "Current" is current record (with same list of fields) that we compare to "state" in our function. If function evaluates to 1 then new group is started. As you see we don't have an access to the whole group of records. E.g. we can't calculate current number of rows in it, we don't have an access to previous (last) record in group etc. Just the very first (s) and current (c) records. We can calculate duration in days between "state" and "current". But how do we know if it's time to start new group? We need to compare this number of days with smth else that we don't have (like number of rows in our group). That's why I add index to compare duration with difference in index values. 

Sorry I didn't understand. Here is my confusion.

  1. Can you point out which exact rows are S and C referring to as the iterations progresses?
  2. Even in your logic why can't we compare Duration.Days output to a single value like 1?

Thank you so much for your time 🙏

 

Lets start with first 2 rows: 

sc01.png

We start with

State = [Winloss = "Win", Date = #date(2024, 01, 01)]

and we compare this record to the very next record

Current =  [Winloss = "Loss", Date = #date(2024, 01, 01)]

Do we need to start new group here? YES, because State[Winloss] <> Current[Winloss].

And we start new group with 

State = [WinLoss = "Loss", Date = #date(2024, 01, 01)] and check for the very next record which is 

Current = [WinLoss = "Loss", Date = #date(2024, 02, 01)]. It's "Current (1)" over here:

sc03.png

 

 Do we need to start new group at Current (1)? NO, because State[Winloss] = Current[Winloss] and duration between Current[Date] and State[Date] is 1 day exactly. We keep grouping so that "State" stays the same! It always stays the same withing the group! 

Next step: we stay in the same group with the same "State" but the very next "Current" becomes our next record

Current (2) = [WinLoss = "Loss", Date = #date(2024, 03, 01)]. Again, "State" is the same as we are in the same group. We need to make a decision: start new group or not? WinLoss field is same but what about Date? At this point we have State[Date] (which is Jan 1st) and Current[Date] (which is Jan 3rd). We don't have info about previous row with Date = Jan 2nd anymore but we need to make a decision.

That's why we need some helper column in the form of index to calculate something like "running total" between "State" and "Current" to compare it to duration in days in this particular case. 

The key is that the only information we have at each stage of iteration is just the very first ("state") row/record and current row. Iterator moves on. So that at each step the arguments of our desicion making function (5th argument) are 2 rows/records: the very first one and the current one. Decision is always the same: do we need to start new group or stay in the current group with the same "State" record and next "Current" record. 

Thank you so much for clarifying. I totally get it now. I was missing the key piece in the puzzle. I owe you a beer when we meet 😁

Helpful resources

Announcements
Sept PBI Carousel

Power BI Monthly Update - September 2024

Check out the September 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors