Solved: Transform data in PowerQuery instead of using embe...

Anonymous · ‎11-11-2022

I have a dataset which is intrinsically problematic but have no plan to correct the data source (due to some reasons). Therefore, I need to correct the dataset in PowerQuery. I have done a python script to do so, but found that running a python script in PowerQuery is not efficient. I am seeking a method in PowerQuery to replicate the result.

Here is an example table.

Car ID	Task ID	Type	Start time	Finish time
Car A	123	S	14/7/2022 10:00:00	14/7/2022 10:00:05
Car A	124	T	14/7/2022 10:00:01	14/7/2022 10:01:00
Car A	124	T	14/7/2022 10:00:02	14/7/2022 10:01:05
Car A	125	S	14/7/2022 10:01:07	14/7/2022 10:01:10
Car A	126	S	14/7/2022 10:01:13	14/7/2022 10:01:13
Car B	234	T	14/7/2022 09:02:00	14/7/2022 09:02:10
Car B	234	T	14/7/2022 09:01:59	14/7/2022 09:02:10
Car B	235	S	14/7/2022 09:02:06	14/7/2022 09:02:20
Car B	236	S	14/7/2022 09:02:20	14/7/2022 09:02:30

Two cars, Car A and Car B. Assume the row order is correct, and no change will be on "Car ID", "Task ID", "Type", and "Finish time".

In "Type", "S" means single, so have only one record for its "Task ID"; "T" means twin, so have two records for its "Task ID".

Here are some rules:

The problematic column is "Start time".

The "Start time" must happen before "Finish time". If "Start time" = "Finish time", it is incorrect. And because the row order is in time sequence, so one's "Start time" happens before last row's "Finish time", it is incorrect as well. To fix it, it will take the "Finish time" of the last "Task ID" and replace the current "Start time".
Because Car A and Car B are independent, so the "Start time" and "Finish time" of either will not affect one another.
In the end, to align the "Start time" of the corresponding "Task ID", "Start time"/"Finish time"of twin should be identical; "Start time" will take the earliest time stamp, and "Finish time" will take the latest time stamp.

So the resulting table after transformation will be as below. (those corrected records are highlighted in blue)

Car ID	Task ID	Type	Start time	Finish time
Car A	123	S	14/7/2022 10:00:00	14/7/2022 10:00:05
Car A	124	T	14/7/2022 10:00:05	14/7/2022 10:01:05
Car A	124	T	14/7/2022 10:00:05	14/7/2022 10:01:05
Car A	125	S	14/7/2022 10:01:07	14/7/2022 10:01:10
Car A	126	S	14/7/2022 10:01:10	14/7/2022 10:01:13
Car B	234	T	14/7/2022 09:01:59	14/7/2022 09:02:10
Car B	234	T	14/7/2022 09:01:59	14/7/2022 09:02:10
Car B	235	S	14/7/2022 09:02:10	14/7/2022 09:02:20
Car B	236	S	14/7/2022 09:02:20	14/7/2022 09:02:30

It is a bit long, and I have no idea whether it is feasible in PowerQuery or not. Hope someone can help and give some advice or hints.

Thank you so much!

latimeria · ‎11-13-2022

Hi @Anonymous ,

If I understood well, this should work:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jZDPCoAgDIdfRTwHupmJ3apHqFt06BV6/0OT/hDbikBEPvf525xnO6yb6WxlAQPtYznVLjn0iAZ868vSYLRL9bRr2ietEASE8uRPGzWbZ0e1cypMCgSe3bzYEHR42T1dY5Cd+0xt81874CP7y4Y25j+2nPvMbhSI3JZz34USBrKXHQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Car ID" = _t, #"Task ID" = _t, Type = _t, #"Start time1" = _t, #"Finish time1" = _t]),
    #"Changed Type" = Table.Buffer(Table.TransformColumnTypes(Source,{{"Start time1", type datetime}, {"Finish time1", type datetime}})),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Task ID", "Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable text, Finish time1=nullable text]}}),
    #"Align StartDate for Twin" = Table.TransformColumns(
            #"Grouped Rows", 
            {
                {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Start time2", (Innertable)=> List.Min(OuterTable[Start time1]), type datetime)}
            }),
    #"Align EndDate for twin" = Table.TransformColumns(
        #"Align StartDate for Twin",
        {
            {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Finish time", (Innertable)=> List.Max(OuterTable[Finish time1]), type datetime)}
        }),
    Custom1 = Table.Combine(#"Align EndDate for twin"[Rows]),
    #"Removed Other Columns" = Table.SelectColumns(Custom1,{"Car ID", "Task ID", "Type", "Start time2", "Finish time"}),
    #"Grouped Rows1" = Table.Group(#"Removed Other Columns", {"Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable datetime, Finish time1=nullable datetime, Start time=datetime]}}),
    #"Added Index Column" = Table.TransformColumns(#"Grouped Rows1", {{"Rows", each Table.AddIndexColumn(_, "Index",0)}}),
    #"Added Start time" = Table.TransformColumns(#"Added Index Column", 
        {
            {"Rows", each Table.AddColumn(_,
                "Start time",
                (a)=> 
                try
                    if a[Task ID] = [Task ID]{a[Index]-1} 
                    then null
                    else   
                        if a[Start time2] > [Finish time]{a[Index]-1} 
                        then 
                            if a[Start time2] < a[Finish time]
                            then a[Start time2] 
                            else [Finish time]{a[Index]-1}
                        else [Finish time]{a[Index]-1}
                otherwise 
                    a[Start time2])
            }
        }),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Added Start time", "Rows", {"Task ID", "Type", "Start time2", "Finish time", "Start time"}, {"Task ID", "Type", "Start time2", "Finish time", "Start time"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Expanded Rows",{"Car ID", "Task ID", "Type", "Finish time", "Start time"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"Car ID", "Task ID", "Type", "Start time", "Finish time"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Car ID", type text}, {"Task ID", Int64.Type}, {"Type", type text}, {"Start time", type datetime}, {"Finish time", type datetime}}),
    #"Filled Down" = Table.FillDown(#"Changed Type1",{"Start time"})
in
    #"Filled Down"

(copy and paste in a blank query)

View solution in original post

ppm1 · ‎11-14-2022

Here is some updated DAX for the two columns.

NewStartTime =
VAR vThisStart = CarTasks[Start time]
VAR vThisTaskID = CarTasks[Task ID]
VAR vMinStartThisTask =
    CALCULATE (
        MIN ( CarTasks[Start time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    )
VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )
VAR vMinStartOverall =
    MIN ( vThisStart, vMinStartThisTask )
VAR vResult =
    IF (
        CarTasks[Start time] = CarTasks[Finish time],
        vMaxPrevTaskFinish,
        MAX ( vMaxPrevTaskFinish, vMinStartOverall )
    )
RETURN
    vResult


NewFinishTime =
IF (
    CarTasks[Type] = "T",
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    ),
    CarTasks[Finish time]
)

Pat

Microsoft Employee

View solution in original post

latimeria · ‎11-13-2022

Hi @Anonymous ,

If I understood well, this should work:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jZDPCoAgDIdfRTwHupmJ3apHqFt06BV6/0OT/hDbikBEPvf525xnO6yb6WxlAQPtYznVLjn0iAZ868vSYLRL9bRr2ietEASE8uRPGzWbZ0e1cypMCgSe3bzYEHR42T1dY5Cd+0xt81874CP7y4Y25j+2nPvMbhSI3JZz34USBrKXHQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Car ID" = _t, #"Task ID" = _t, Type = _t, #"Start time1" = _t, #"Finish time1" = _t]),
    #"Changed Type" = Table.Buffer(Table.TransformColumnTypes(Source,{{"Start time1", type datetime}, {"Finish time1", type datetime}})),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Task ID", "Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable text, Finish time1=nullable text]}}),
    #"Align StartDate for Twin" = Table.TransformColumns(
            #"Grouped Rows", 
            {
                {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Start time2", (Innertable)=> List.Min(OuterTable[Start time1]), type datetime)}
            }),
    #"Align EndDate for twin" = Table.TransformColumns(
        #"Align StartDate for Twin",
        {
            {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Finish time", (Innertable)=> List.Max(OuterTable[Finish time1]), type datetime)}
        }),
    Custom1 = Table.Combine(#"Align EndDate for twin"[Rows]),
    #"Removed Other Columns" = Table.SelectColumns(Custom1,{"Car ID", "Task ID", "Type", "Start time2", "Finish time"}),
    #"Grouped Rows1" = Table.Group(#"Removed Other Columns", {"Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable datetime, Finish time1=nullable datetime, Start time=datetime]}}),
    #"Added Index Column" = Table.TransformColumns(#"Grouped Rows1", {{"Rows", each Table.AddIndexColumn(_, "Index",0)}}),
    #"Added Start time" = Table.TransformColumns(#"Added Index Column", 
        {
            {"Rows", each Table.AddColumn(_,
                "Start time",
                (a)=> 
                try
                    if a[Task ID] = [Task ID]{a[Index]-1} 
                    then null
                    else   
                        if a[Start time2] > [Finish time]{a[Index]-1} 
                        then 
                            if a[Start time2] < a[Finish time]
                            then a[Start time2] 
                            else [Finish time]{a[Index]-1}
                        else [Finish time]{a[Index]-1}
                otherwise 
                    a[Start time2])
            }
        }),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Added Start time", "Rows", {"Task ID", "Type", "Start time2", "Finish time", "Start time"}, {"Task ID", "Type", "Start time2", "Finish time", "Start time"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Expanded Rows",{"Car ID", "Task ID", "Type", "Finish time", "Start time"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"Car ID", "Task ID", "Type", "Start time", "Finish time"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Car ID", type text}, {"Task ID", Int64.Type}, {"Type", type text}, {"Start time", type datetime}, {"Finish time", type datetime}}),
    #"Filled Down" = Table.FillDown(#"Changed Type1",{"Start time"})
in
    #"Filled Down"

(copy and paste in a blank query)

Anonymous · ‎11-14-2022

It is remarkable!! It works!!!

Anonymous · ‎11-14-2022

@latimeria Thank you so much. Please let me have a look first, will try in a couple of hours and have to get familiar with m-language first

ppm1 · ‎11-13-2022

While it is possible to do this in the query editor, it would be easier to do it in DAX calculated columns. Below is an example. It works to generate your desired start times for all but task 126 (I don't understand why that one needs to be updated). Hopefully you can adapt the logic to get what you need. A simpler expression can be made for the finished time.

NewStartTime =
VAR vThisStart = CarTasks[Start time]
VAR vThisTaskID = CarTasks[Task ID]
VAR vMinStartThisTask =
    CALCULATE (
        MIN ( CarTasks[Start time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    )
VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )
VAR vMinStartOverall =
    MIN ( vThisStart, vMinStartThisTask )
VAR vResult =
    MAX ( vMaxPrevTaskFinish, vMinStartOverall )
RETURN
    vResult

Pat

Microsoft Employee

Anonymous · ‎11-14-2022

@ppm1 wrote:

VAR vThisTaskID = CarTasks[Task ID]

VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )

It is brilliant, I have never thought that DAX is so powerful till you showed me this simple and clear code!

But I am wondering why have to include "CarTasks[Task ID] < vThisTask"? As from the "VAR vThisTaskID = CarTasks[Task ID]", "CarTasks[Task ID] < vThisTask" will become "CarTasks[Task ID] < CarTasks[Task ID]".

And for the task 126, if it is not allowed to have "Start time" = "Finish time", otherwise, current "Start time" should be replaced by the last "Finish time". What should I add to the logic?

Many Thanks!

ppm1 · ‎11-14-2022

Here is some updated DAX for the two columns.

NewStartTime =
VAR vThisStart = CarTasks[Start time]
VAR vThisTaskID = CarTasks[Task ID]
VAR vMinStartThisTask =
    CALCULATE (
        MIN ( CarTasks[Start time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    )
VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )
VAR vMinStartOverall =
    MIN ( vThisStart, vMinStartThisTask )
VAR vResult =
    IF (
        CarTasks[Start time] = CarTasks[Finish time],
        vMaxPrevTaskFinish,
        MAX ( vMaxPrevTaskFinish, vMinStartOverall )
    )
RETURN
    vResult


NewFinishTime =
IF (
    CarTasks[Type] = "T",
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    ),
    CarTasks[Finish time]
)

Pat

Microsoft Employee

Anonymous · ‎11-14-2022

Thank you so much!!! And thanks for adding an additional "Finish time" column.

Anonymous · ‎11-14-2022

@ppm1 Thanks so much. Let me implement this logic tonight, in a few hours. I need to get familiar with the DAX first.

To your question about task 126, it is because "start time" can't be the same as "finish time"...... probably I didn't make it clear in the beginning.

lbendlin · ‎11-12-2022

Assume the row order is correct

Can't do that. Add an index column at the earliest possible opportunity.

To fix it, it will take the "Finish time" of the last "Task ID" and replace the current "Start time".

Please explain the rationale behind this rule. Seems to be slightly weird. It also doesn't seem to match your indicated expected result.

Anonymous · ‎11-13-2022

@lbendlin wrote:

Please explain the rationale behind this rule. Seems to be slightly weird. It also doesn't seem to match your indicated expected result.

Because the source is problematic at only the "Start time", but no fault at the "Finish time", provided that every task is in sequence, meaning no time overlapping between each other.

So once I spot a fault in "Start time", the remedy is to modify the "Start time" by taking out the overlapping session. For those records without overlapping, their "Start time" would be remained unchanged.

Is it possible to implement this idea in PowerBI PowerQuery without using python?

Transform data in PowerQuery instead of using embedded python in PowerQuery

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly

Join the #PBI10 DataViz contest

Transform data in PowerQuery instead of using embedded python in PowerQuery

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly