Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Anonymous
Not applicable

Calculate date difference between two consecutive rows grouped by column using python script

Hi all,

 

I am working on clinical data where I need to calulate sessions for patients by calculating date difference. I can do the same in python but the same script when used in power bi's power query it gives error as below

DataSource.Error: ADO.NET: Python script error.
<pi>TypeError: unsupported operand type(s) for -: 'str' and 'str'
</pi>
Details:
DataSourceKind=Python
DataSourcePath=Python
Message=Python script error.
<pi>TypeError: unsupported operand type(s) for -: 'str' and 'str'
</pi>
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting

Original data is as below:-

data.PNG

Expected output is as below:-

CD.PNG

 

The python script used by me is

# 'dataset' holds the input data for this script
import pandas as pd
import numpy as np
import datetime
dataset['Days_btw'] = dataset.groupby('PatientID')['SessionDate'].diff() / np.timedelta64(1, 'D')

Any help or suggestions are appreciated.

Thank You.

1 ACCEPTED SOLUTION
MFelix
Super User
Super User

Hi @Anonymous ,

 

You can do this using two different approaches Power Query or DAX.

 

Power Query

  • Sort the table by ID and by Date
  • Add an index column
  • Add the following column to your model:
try if [PatientID] = #"Added Index"{[Index]-1}[PatientID] then  [SessionDate] - #"Added Index"{[Index]-1}[SessionDate] else 0 otherwise 0

 

Result and complete code below:

MFelix_0-1639995221388.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bc7bCcAwDEPRXfwdiKXmOUvI/mskpaW0Rb8HG90xjCVZMNToOdIJm+HBprALpCuEwqQwKywKVSd/nQ1n5x7CRvQ3FoFOgTwUNvWeL/ysV3XYdRH8xrkA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [PatientID = _t, SessionDate = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SessionDate", type date}, {"PatientID", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"PatientID", Order.Ascending}, {"SessionDate", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Days_btw", each try if [PatientID] = #"Added Index"{[Index]-1}[PatientID] then  [SessionDate] - #"Added Index"{[Index]-1}[SessionDate] else 0 otherwise 0),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Days_btw", Int64.Type}})
in
    #"Changed Type1"

 

DAX

  • Add a calculated column with the following code:
Days_btw_dax = 
COALESCE (
    DATEDIFF (
        CALCULATE (
            MAX ( 'Table (3)'[SessionDate] ),
            FILTER (
                ALL ( 'Table (3)'[PatientID],'Table (3)'[SessionDate] ),
                'Table (3)'[PatientID] = EARLIER ( 'Table (3)'[PatientID] )
                    && 'Table (3)'[SessionDate] < EARLIER ( 'Table (3)'[SessionDate] )
            )
        ),
        'Table (3)'[SessionDate],
        DAY
    ),
    0
)

 

MFelix_1-1639996179799.png

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



View solution in original post

1 REPLY 1
MFelix
Super User
Super User

Hi @Anonymous ,

 

You can do this using two different approaches Power Query or DAX.

 

Power Query

  • Sort the table by ID and by Date
  • Add an index column
  • Add the following column to your model:
try if [PatientID] = #"Added Index"{[Index]-1}[PatientID] then  [SessionDate] - #"Added Index"{[Index]-1}[SessionDate] else 0 otherwise 0

 

Result and complete code below:

MFelix_0-1639995221388.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bc7bCcAwDEPRXfwdiKXmOUvI/mskpaW0Rb8HG90xjCVZMNToOdIJm+HBprALpCuEwqQwKywKVSd/nQ1n5x7CRvQ3FoFOgTwUNvWeL/ysV3XYdRH8xrkA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [PatientID = _t, SessionDate = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SessionDate", type date}, {"PatientID", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"PatientID", Order.Ascending}, {"SessionDate", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Days_btw", each try if [PatientID] = #"Added Index"{[Index]-1}[PatientID] then  [SessionDate] - #"Added Index"{[Index]-1}[SessionDate] else 0 otherwise 0),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Days_btw", Int64.Type}})
in
    #"Changed Type1"

 

DAX

  • Add a calculated column with the following code:
Days_btw_dax = 
COALESCE (
    DATEDIFF (
        CALCULATE (
            MAX ( 'Table (3)'[SessionDate] ),
            FILTER (
                ALL ( 'Table (3)'[PatientID],'Table (3)'[SessionDate] ),
                'Table (3)'[PatientID] = EARLIER ( 'Table (3)'[PatientID] )
                    && 'Table (3)'[SessionDate] < EARLIER ( 'Table (3)'[SessionDate] )
            )
        ),
        'Table (3)'[SessionDate],
        DAY
    ),
    0
)

 

MFelix_1-1639996179799.png

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.