Solved: Grouping Problem

ChandeepChhabra · ‎09-24-2024

I am not sure where am I making a mistake. PQ (M) geniuses please help!

I have this 2 columnar table (Dates are in dd-mm)

WinLoss Date

Win	01-01-2024
Loss	01-01-2024
Loss	02-01-2024
Loss	03-01-2024
Loss	04-01-2024
Win	03-01-2024
Win	04-01-2024
Win	05-01-2024
Win	05-01-2024
Win	05-01-2024

I want to group this data and add a 3rd column to this table as Streak Count. The logic is as follows:

A streak gets counted only if the win/loss is on a consecutive day.
If the next win/loss is on the same day it is not considered as a streak.

Here is the expected output.

WinLoss	Date	Streak
Win	01-Jan-24	1
Loss	01-Jan-24	4
Win	03-Jan-24	3
Win	05-Jan-24	1
Win	05-Jan-24	1

I only want to solve this using Table.Group function's 5th parameter. Here is my code so far and it doens't seem to work.

let
Source = Excel.CurrentWorkbook(){[Name="WinLossWDates"]}[Content],
DataTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"WinLoss", type text}}),
Group = Table.Group(
DataTypes,
{"WinLoss", "Date"}, {"Streak", each _}, GroupKind.Local,
(cr, nr) => if Duration.Days(nr[Date] - cr[Date]) = 1 and cr[WinLoss] = nr[WinLoss] then 0 else 1
)
in
Group

By the way : the new chat gpt sucks!

Thank you

C

AlienSx · ‎09-24-2024

Hello, @ChandeepChhabra I don't think you'll solve this with Table.Group and it's 5th agrument alone. Consider index column to control number of rows in your current group.

let
    Source = Excel.CurrentWorkbook(){[Name="WinLossWDates"]}[Content],
    idx = Table.AddIndexColumn(Source, "idx"),
    group = Table.Group(
        idx, 
        {"WinLoss", "Date", "idx"}, 
        {"Streak", Table.RowCount}, 
        GroupKind.Local, 
        (s, c) => Number.From(
            s[WinLoss] <> c[WinLoss] or 
            Duration.Days(c[Date] - s[Date]) <> (c[idx] - s[idx])
        )
    )
in
    group

View solution in original post

AlienSx · ‎09-24-2024

Lets start with first 2 rows:

We start with

State = [Winloss = "Win", Date = #date(2024, 01, 01)]

and we compare this record to the very next record

Current = [Winloss = "Loss", Date = #date(2024, 01, 01)]

Do we need to start new group here? YES, because State[Winloss] <> Current[Winloss].

And we start new group with

State = [WinLoss = "Loss", Date = #date(2024, 01, 01)] and check for the very next record which is

Current = [WinLoss = "Loss", Date = #date(2024, 02, 01)]. It's "Current (1)" over here:

Do we need to start new group at Current (1)? NO, because State[Winloss] = Current[Winloss] and duration between Current[Date] and State[Date] is 1 day exactly. We keep grouping so that "State" stays the same! It always stays the same withing the group!

Next step: we stay in the same group with the same "State" but the very next "Current" becomes our next record

Current (2) = [WinLoss = "Loss", Date = #date(2024, 03, 01)]. Again, "State" is the same as we are in the same group. We need to make a decision: start new group or not? WinLoss field is same but what about Date? At this point we have State[Date] (which is Jan 1st) and Current[Date] (which is Jan 3rd). We don't have info about previous row with Date = Jan 2nd anymore but we need to make a decision.

That's why we need some helper column in the form of index to calculate something like "running total" between "State" and "Current" to compare it to duration in days in this particular case.

The key is that the only information we have at each stage of iteration is just the very first ("state") row/record and current row. Iterator moves on. So that at each step the arguments of our desicion making function (5th argument) are 2 rows/records: the very first one and the current one. Decision is always the same: do we need to start new group or stay in the current group with the same "State" record and next "Current" record.

View solution in original post

Anonymous · ‎09-30-2024

Hi @ChandeepChhabra ,

Have you solved your problem?
If solved please mark the reply in this post which you think is helpful as a solution to help more others facing the same problem to find a solution quickly, thank you very much!

Best Regards,
Dino Tao

Omid_Motamedise · ‎09-24-2024

You can use this

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    DataTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"WinLoss", type text}}),
    #"Added Index" = Table.AddIndexColumn(DataTypes, "Index", 0, 1, Int64.Type),
 Group = Table.Group(
  #"Added Index",
  {"WinLoss", "Date","Index"}, {"Streak", each _}, GroupKind.Local,
  (cr, nr) => if Duration.Days(nr[Date] - cr[Date]) =(nr[Index] - cr[Index])  and cr[WinLoss] = nr[WinLoss] then 0 else 1
)
in
 Group

If my answer helped solve your issue, please consider marking it as the accepted solution. It helps others in the community find answers faster—and keeps the community growing stronger!
You can also check out my YouTube channel for tutorials, tips, and real-world solutions in Power Query with the following link
https://youtube.com/@omidbi?si=96Bo-ZsSwOx0Z36h

AlienSx · ‎09-24-2024

Hello, @ChandeepChhabra I don't think you'll solve this with Table.Group and it's 5th agrument alone. Consider index column to control number of rows in your current group.

let
    Source = Excel.CurrentWorkbook(){[Name="WinLossWDates"]}[Content],
    idx = Table.AddIndexColumn(Source, "idx"),
    group = Table.Group(
        idx, 
        {"WinLoss", "Date", "idx"}, 
        {"Streak", Table.RowCount}, 
        GroupKind.Local, 
        (s, c) => Number.From(
            s[WinLoss] <> c[WinLoss] or 
            Duration.Days(c[Date] - s[Date]) <> (c[idx] - s[idx])
        )
    )
in
    group

ChandeepChhabra · ‎09-24-2024

I understand the solution but can you explain the reason for adding an index and why can't it work without the index?

AlienSx · ‎09-24-2024

Basics: 5th argument of Table.Group is a function of 2 arguments. I usually call them "s" (state) and "c" (current). "State" is first record (table row) in the group of records and consists of fields listed in 2nd argument. "Current" is current record (with same list of fields) that we compare to "state" in our function. If function evaluates to 1 then new group is started. As you see we don't have an access to the whole group of records. E.g. we can't calculate current number of rows in it, we don't have an access to previous (last) record in group etc. Just the very first (s) and current (c) records. We can calculate duration in days between "state" and "current". But how do we know if it's time to start new group? We need to compare this number of days with smth else that we don't have (like number of rows in our group). That's why I add index to compare duration with difference in index values.

ChandeepChhabra · ‎09-24-2024

Sorry I didn't understand. Here is my confusion.

Can you point out which exact rows are S and C referring to as the iterations progresses?
Even in your logic why can't we compare Duration.Days output to a single value like 1?

Thank you so much for your time 🙏

AlienSx · ‎09-24-2024