Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
OzzyM91
Frequent Visitor

Find difference between current and previous row - performance issue

Hello,

I've got a dataset of transactions, including various salesmen with transactions times. I need to create a calculated column returning datetime of previous transaction, allowing me to subsequently calculate datetime difference and build other parameters around it.

 

3fec3cadb8b6293880c22c854744a0c1.png

 I've browsed forums and ended up with smooth solution using EARLIER:

 

Expected Result (Prev Transaction) = CALCULATE(
        MAX(Example[Transaction Time]),
        Example,
        Example[Transaction Time] < EARLIER(Example[Transaction Time]),
        Example[Salesman] = EARLIER(Example[Salesman])
        )

 


However it performs very poorly over my real dataset of 500k rows and multiple salesmen, it ran out of memory after few minutes.

 

Once Ive changed datetime column 'Transaction Time' to date, it did complete quickly, but since I need to calculate exact time difference between transactions, such solution isn't enough and I'm looking for improved solution for datetime data.

 

Looking forward to any hints!

1 ACCEPTED SOLUTION

Ok, try a different approach in Power Query. Basically it entails creating a single table with the transaction time and the next transacion time. Then leave the calculations to measures (which are simple since both transaction times are on the same row of the table. 

To do this, you need to sort the table by salesman and Transaction time (in ascending order), and add an index column starting at 1

index 1.gif

Now duplicate the table and change the index order to start at 0

index 0.gif

Now you can merge both tables by selecting salesman and index, and keep only the new transaction row from the second table:

merge.gif(You can disable load for the second table since you don't need it in the model)

If you prefer to do this whole process with a single query, here is the M code:

let
  Source = Table.FromRows(
    Json.Document(
      Binary.Decompress(
        Binary.FromText(
          "i45WCsnPVdJRMjLSNzDUNzIwMlIwNLIyMFCK1cEmZQqT8s1MzkhMzQFJGyOkTfBLm1kZoxiMJGVuZWKKR6cFmpNMsLgWSacJfhebkGYwVMoUWRfII7EA",
          BinaryEncoding.Base64
        ),
        Compression.Deflate
      )
    ),
    let
      _t = ((type nullable text) meta [Serialized.Text = true])
    in
      type table [Salesman = _t, #"Transaction time" = _t]
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    Source,
    {{"Salesman", type text}, {"Transaction time", type datetime}}
  ),
  #"Sorted Rows" = Table.Sort(
    #"Changed Type",
    {{"Salesman", Order.Ascending}, {"Transaction time", Order.Ascending}}
  ),
  #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
  Source1 = Source,
  #"Changed Type1" = Table.TransformColumnTypes(
    Source1,
    {{"Salesman", type text}, {"Transaction time", type datetime}}
  ),
  #"Sorted Rows1" = Table.Sort(
    #"Changed Type1",
    {{"Salesman", Order.Ascending}, {"Transaction time", Order.Ascending}}
  ),
  #"Added Index1" = Table.AddIndexColumn(#"Sorted Rows1", "Index", 0, 1, Int64.Type),
  #"Merged Queries" = Table.NestedJoin(
    #"Added Index",
    {"Salesman", "Index"},
    #"Added Index1",
    {"Salesman", "Index"},
    "Next transaction",
    JoinKind.LeftOuter
  ),
  #"Expanded Next Transaction" = Table.ExpandTableColumn(
    #"Merged Queries",
    "Next transaction",
    {"Transaction time"},
    {"Next transaction"}
  ),
  #"Removed Columns" = Table.RemoveColumns(#"Expanded Next Transaction", {"Index"})
in
  #"Removed Columns"

Now the measures are much simpler:

Difference in Minutes =
DATEDIFF (
    MAX ( 'TD on rows'[Transaction time] ),
    MAX ( 'TD on rows'[Next transaction] ),
    MINUTE
)
Diff vs Next transaction time (HH:MM) =
VAR _MinutesDiff =
    DATEDIFF (
        MAX ( 'TD on rows'[Transaction time] ),
        MAX ( 'TD on rows'[Next transaction] ),
        MINUTE
    )
VAR _FinalHours =
    INT ( DIVIDE ( _MinutesDiff, 60 ) )
VAR _FinalMinutes =
    FORMAT ( MOD ( _MinutesDiff, 60 ), "00" )
RETURN
    IF (
        ISBLANK ( MAX ( 'TD on rows'[Next transaction] ) ),
        BLANK (),
        _FinalHours & ":" & _FinalMinutes
    )

result rows.png

 

New file attached

 





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






View solution in original post

6 REPLIES 6
PaulDBrown
Community Champion
Community Champion

Any particular reason you need this as a calculated column? Typically these calculations are done with measures





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






Oh, not really, measure would be fine as well, I just couldn't find any reasonable measure solution that wouldn't cause same performance issue with EARLIER going through large table of datetime records.

 

I'm beginner in Power BI, so its likely I've just missed some simple measure solution I guess?

Ok, so here are a couple of measures to do the calculations.

To get the previous transaction date/time by salesman:

 

Prev Transaction =
CALCULATE (
    MAX ( fTable[Transaction time] ),
    FILTER (
        ALLEXCEPT ( fTable, fTable[Salesman] ),
        fTable[Transaction time] < MAX ( fTable[Transaction time] )
    )
)

 

If you want to calculate the difference in hours and minutes directly, use:

 

Diff vs Previous transaction time (HH:MM) =
VAR _Prev =
    CALCULATE (
        MAX ( fTable[Transaction time] ),
        FILTER (
            ALLEXCEPT ( fTable, fTable[Salesman] ),
            fTable[Transaction time] < MAX ( fTable[Transaction time] )
        )
    )
VAR _MinutesDiff =
    DATEDIFF ( _Prev, MAX ( fTable[Transaction time] ), MINUTE )
VAR _FinalHours =
    INT ( DIVIDE ( _MinutesDiff, 60 ) )
VAR _FinalMinutes =
    FORMAT ( MOD ( _MinutesDiff, 60 ), "00" )
RETURN
    IF ( ISBLANK ( _Prev ), BLANK (), _FinalHours & ":" & _FinalMinutes )

 

Bear in mind that the above measure is formatted as text (since you cannot have a time value where the hours > 24). So basically this is useful to display in tables or matrices, but you cannot be used for calculations or in visuals which require numeric values.

If you need to make calculations of need to display the time difference in visuals requiring numeric values, you will need to use the difference in minutes:

 

Diff in minutes =
VAR _Prev =
    CALCULATE (
        MAX ( fTable[Transaction time] ),
        FILTER (
            ALLEXCEPT ( fTable, fTable[Salesman] ),
            fTable[Transaction time] < MAX ( fTable[Transaction time] )
        )
    )
VAR _MinutesDiff =
    DATEDIFF ( _Prev, MAX ( fTable[Transaction time] ), MINUTE )
RETURN
    _MinutesDiff

 

You can of course include the HH:MM measure in the tooltips:

2022-10-05.pngI've attached the sample PBIX file

 





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






Hey, I've tried your solution for retrieving previous transaction time, and although it works it's performing poorly as well.

 

It took 24 seconds to load full table (just name, transaction time and previous transaction) on 15k rows test dataset. On 40k set it ran out of memory, and my real dataset is 500k and might grow to 1mln+ in future.

 

So I assume it means I have to step back to my original data?

Ive tried something different in the meantime. I went back to power query and I've added Transaction ID to my Transactions sorted by transaction time:

 

b7af40e8456987ae8644771c2ae123ba.png

 

 Then I've duplicated Salesman/Transaction ID into new table, grouped by salesman and ended up with a list of transaction IDs sorted by datetime, which I've then expanded back:

14beb8e27101d987950c48cbb0d84a6e.png


 Up to this point everything works smoothly on 500k dataset, but now comes the bottleneck:

 

 

#"Added Custom1" = Table.AddColumn(#"Sorted Rows1", "Custom", each [Transactions List]{List.PositionOf([Transactions List],[Transaction ID])+1})

 

 


It does return proper data, which is now very easy to transform into transaction time and then, with functions you've made to time difference.

 

5eb152ba8a445caa1e98e934805f3259.png

 

However it took about 2 minutes on 40k dataset, which is improvement, but obviously still too slow for my real data, as one of salesman has 60k transactions, meaning list of 60k length it has to search through... 

 

PS: Oh it might get little confusing since now Ive been looking for next transaction instead of previous, but my final target is simply datetime difference, doesn't matter whether vs previous or next.

Ok, try a different approach in Power Query. Basically it entails creating a single table with the transaction time and the next transacion time. Then leave the calculations to measures (which are simple since both transaction times are on the same row of the table. 

To do this, you need to sort the table by salesman and Transaction time (in ascending order), and add an index column starting at 1

index 1.gif

Now duplicate the table and change the index order to start at 0

index 0.gif

Now you can merge both tables by selecting salesman and index, and keep only the new transaction row from the second table:

merge.gif(You can disable load for the second table since you don't need it in the model)

If you prefer to do this whole process with a single query, here is the M code:

let
  Source = Table.FromRows(
    Json.Document(
      Binary.Decompress(
        Binary.FromText(
          "i45WCsnPVdJRMjLSNzDUNzIwMlIwNLIyMFCK1cEmZQqT8s1MzkhMzQFJGyOkTfBLm1kZoxiMJGVuZWKKR6cFmpNMsLgWSacJfhebkGYwVMoUWRfII7EA",
          BinaryEncoding.Base64
        ),
        Compression.Deflate
      )
    ),
    let
      _t = ((type nullable text) meta [Serialized.Text = true])
    in
      type table [Salesman = _t, #"Transaction time" = _t]
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    Source,
    {{"Salesman", type text}, {"Transaction time", type datetime}}
  ),
  #"Sorted Rows" = Table.Sort(
    #"Changed Type",
    {{"Salesman", Order.Ascending}, {"Transaction time", Order.Ascending}}
  ),
  #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
  Source1 = Source,
  #"Changed Type1" = Table.TransformColumnTypes(
    Source1,
    {{"Salesman", type text}, {"Transaction time", type datetime}}
  ),
  #"Sorted Rows1" = Table.Sort(
    #"Changed Type1",
    {{"Salesman", Order.Ascending}, {"Transaction time", Order.Ascending}}
  ),
  #"Added Index1" = Table.AddIndexColumn(#"Sorted Rows1", "Index", 0, 1, Int64.Type),
  #"Merged Queries" = Table.NestedJoin(
    #"Added Index",
    {"Salesman", "Index"},
    #"Added Index1",
    {"Salesman", "Index"},
    "Next transaction",
    JoinKind.LeftOuter
  ),
  #"Expanded Next Transaction" = Table.ExpandTableColumn(
    #"Merged Queries",
    "Next transaction",
    {"Transaction time"},
    {"Next transaction"}
  ),
  #"Removed Columns" = Table.RemoveColumns(#"Expanded Next Transaction", {"Index"})
in
  #"Removed Columns"

Now the measures are much simpler:

Difference in Minutes =
DATEDIFF (
    MAX ( 'TD on rows'[Transaction time] ),
    MAX ( 'TD on rows'[Next transaction] ),
    MINUTE
)
Diff vs Next transaction time (HH:MM) =
VAR _MinutesDiff =
    DATEDIFF (
        MAX ( 'TD on rows'[Transaction time] ),
        MAX ( 'TD on rows'[Next transaction] ),
        MINUTE
    )
VAR _FinalHours =
    INT ( DIVIDE ( _MinutesDiff, 60 ) )
VAR _FinalMinutes =
    FORMAT ( MOD ( _MinutesDiff, 60 ), "00" )
RETURN
    IF (
        ISBLANK ( MAX ( 'TD on rows'[Next transaction] ) ),
        BLANK (),
        _FinalHours & ":" & _FinalMinutes
    )

result rows.png

 

New file attached

 





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






It works great now! Takes a while longer to load data but measures and visuals are calculated instantly 😎 Thank you for help and your great contribution to community!

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.