Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Anonymous
Not applicable

Transform data in PowerQuery instead of using embedded python in PowerQuery

I have a dataset which is intrinsically problematic but have no plan to correct the data source (due to some reasons). Therefore, I need to correct the dataset in PowerQuery. I have done a python script to do so, but found that running a python script in PowerQuery is not efficient. I am seeking a method in PowerQuery to replicate the result.

 

Here is an example table.

Car IDTask IDTypeStart timeFinish time

Car A

123S14/7/2022 10:00:0014/7/2022 10:00:05
Car A124T14/7/2022 10:00:0114/7/2022 10:01:00
Car A124T14/7/2022 10:00:0214/7/2022 10:01:05
Car A125S14/7/2022 10:01:0714/7/2022 10:01:10
Car A126S14/7/2022 10:01:1314/7/2022 10:01:13
Car B234T14/7/2022 09:02:0014/7/2022 09:02:10
Car B234T14/7/2022 09:01:5914/7/2022 09:02:10
Car B235S14/7/2022 09:02:0614/7/2022 09:02:20
Car B236S14/7/2022 09:02:2014/7/2022 09:02:30

 

Two cars, Car A and Car B. Assume the row order is correct, and no change will be on "Car ID", "Task ID", "Type", and "Finish time".

In "Type", "S" means single, so have only one record for its "Task ID"; "T" means twin, so have two records for its "Task ID".

 

Here are some rules:

The problematic column is "Start time".

  1. The "Start time" must happen before "Finish time". If "Start time" = "Finish time", it is incorrect. And because the row order is in time sequence, so one's "Start time" happens before last row's "Finish time", it is incorrect as well. To fix it, it will take the "Finish time" of the last "Task ID" and replace the current "Start time". 
  2. Because Car A and Car B are independent, so the "Start time" and "Finish time" of either will not affect one another. 
  3. In the end, to align the "Start time" of the corresponding "Task ID", "Start time"/"Finish time"of twin should be identical; "Start time" will take the earliest time stamp, and "Finish time" will take the latest time stamp.

 

So the resulting table after transformation will be as below. (those corrected records are highlighted in blue)

Car IDTask IDTypeStart timeFinish time

Car A

123S14/7/2022 10:00:0014/7/2022 10:00:05
Car A124T14/7/2022 10:00:0514/7/2022 10:01:05
Car A124T14/7/2022 10:00:0514/7/2022 10:01:05
Car A125S14/7/2022 10:01:0714/7/2022 10:01:10
Car A126S14/7/2022 10:01:1014/7/2022 10:01:13
Car B234T14/7/2022 09:01:5914/7/2022 09:02:10
Car B234T14/7/2022 09:01:5914/7/2022 09:02:10
Car B235S14/7/2022 09:02:1014/7/2022 09:02:20
Car B236S14/7/2022 09:02:2014/7/2022 09:02:30

 

It is a bit long, and I have no idea whether it is feasible in PowerQuery or not. Hope someone can help and give some advice or hints.

Thank you so much!

2 ACCEPTED SOLUTIONS
latimeria
Solution Specialist
Solution Specialist

Hi @Anonymous ,

If I understood well, this should work:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jZDPCoAgDIdfRTwHupmJ3apHqFt06BV6/0OT/hDbikBEPvf525xnO6yb6WxlAQPtYznVLjn0iAZ868vSYLRL9bRr2ietEASE8uRPGzWbZ0e1cypMCgSe3bzYEHR42T1dY5Cd+0xt81874CP7y4Y25j+2nPvMbhSI3JZz34USBrKXHQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Car ID" = _t, #"Task ID" = _t, Type = _t, #"Start time1" = _t, #"Finish time1" = _t]),
    #"Changed Type" = Table.Buffer(Table.TransformColumnTypes(Source,{{"Start time1", type datetime}, {"Finish time1", type datetime}})),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Task ID", "Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable text, Finish time1=nullable text]}}),
    #"Align StartDate for Twin" = Table.TransformColumns(
            #"Grouped Rows", 
            {
                {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Start time2", (Innertable)=> List.Min(OuterTable[Start time1]), type datetime)}
            }),
    #"Align EndDate for twin" = Table.TransformColumns(
        #"Align StartDate for Twin",
        {
            {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Finish time", (Innertable)=> List.Max(OuterTable[Finish time1]), type datetime)}
        }),
    Custom1 = Table.Combine(#"Align EndDate for twin"[Rows]),
    #"Removed Other Columns" = Table.SelectColumns(Custom1,{"Car ID", "Task ID", "Type", "Start time2", "Finish time"}),
    #"Grouped Rows1" = Table.Group(#"Removed Other Columns", {"Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable datetime, Finish time1=nullable datetime, Start time=datetime]}}),
    #"Added Index Column" = Table.TransformColumns(#"Grouped Rows1", {{"Rows", each Table.AddIndexColumn(_, "Index",0)}}),
    #"Added Start time" = Table.TransformColumns(#"Added Index Column", 
        {
            {"Rows", each Table.AddColumn(_,
                "Start time",
                (a)=> 
                try
                    if a[Task ID] = [Task ID]{a[Index]-1} 
                    then null
                    else   
                        if a[Start time2] > [Finish time]{a[Index]-1} 
                        then 
                            if a[Start time2] < a[Finish time]
                            then a[Start time2] 
                            else [Finish time]{a[Index]-1}
                        else [Finish time]{a[Index]-1}
                otherwise 
                    a[Start time2])
            }
        }),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Added Start time", "Rows", {"Task ID", "Type", "Start time2", "Finish time", "Start time"}, {"Task ID", "Type", "Start time2", "Finish time", "Start time"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Expanded Rows",{"Car ID", "Task ID", "Type", "Finish time", "Start time"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"Car ID", "Task ID", "Type", "Start time", "Finish time"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Car ID", type text}, {"Task ID", Int64.Type}, {"Type", type text}, {"Start time", type datetime}, {"Finish time", type datetime}}),
    #"Filled Down" = Table.FillDown(#"Changed Type1",{"Start time"})
in
    #"Filled Down"

 (copy and paste in a blank query) 

View solution in original post

Here is some updated DAX for the two columns.

NewStartTime =
VAR vThisStart = CarTasks[Start time]
VAR vThisTaskID = CarTasks[Task ID]
VAR vMinStartThisTask =
    CALCULATE (
        MIN ( CarTasks[Start time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    )
VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )
VAR vMinStartOverall =
    MIN ( vThisStart, vMinStartThisTask )
VAR vResult =
    IF (
        CarTasks[Start time] = CarTasks[Finish time],
        vMaxPrevTaskFinish,
        MAX ( vMaxPrevTaskFinish, vMinStartOverall )
    )
RETURN
    vResult


NewFinishTime =
IF (
    CarTasks[Type] = "T",
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    ),
    CarTasks[Finish time]
)

 

Pat

Microsoft Employee

View solution in original post

10 REPLIES 10
latimeria
Solution Specialist
Solution Specialist

Hi @Anonymous ,

If I understood well, this should work:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jZDPCoAgDIdfRTwHupmJ3apHqFt06BV6/0OT/hDbikBEPvf525xnO6yb6WxlAQPtYznVLjn0iAZ868vSYLRL9bRr2ietEASE8uRPGzWbZ0e1cypMCgSe3bzYEHR42T1dY5Cd+0xt81874CP7y4Y25j+2nPvMbhSI3JZz34USBrKXHQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Car ID" = _t, #"Task ID" = _t, Type = _t, #"Start time1" = _t, #"Finish time1" = _t]),
    #"Changed Type" = Table.Buffer(Table.TransformColumnTypes(Source,{{"Start time1", type datetime}, {"Finish time1", type datetime}})),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Task ID", "Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable text, Finish time1=nullable text]}}),
    #"Align StartDate for Twin" = Table.TransformColumns(
            #"Grouped Rows", 
            {
                {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Start time2", (Innertable)=> List.Min(OuterTable[Start time1]), type datetime)}
            }),
    #"Align EndDate for twin" = Table.TransformColumns(
        #"Align StartDate for Twin",
        {
            {"Rows", (OuterTable)=> Table.AddColumn(OuterTable,"Finish time", (Innertable)=> List.Max(OuterTable[Finish time1]), type datetime)}
        }),
    Custom1 = Table.Combine(#"Align EndDate for twin"[Rows]),
    #"Removed Other Columns" = Table.SelectColumns(Custom1,{"Car ID", "Task ID", "Type", "Start time2", "Finish time"}),
    #"Grouped Rows1" = Table.Group(#"Removed Other Columns", {"Car ID"}, {{"Rows", each _, type table [Car ID=nullable text, Task ID=nullable text, Type=nullable text, Start time1=nullable datetime, Finish time1=nullable datetime, Start time=datetime]}}),
    #"Added Index Column" = Table.TransformColumns(#"Grouped Rows1", {{"Rows", each Table.AddIndexColumn(_, "Index",0)}}),
    #"Added Start time" = Table.TransformColumns(#"Added Index Column", 
        {
            {"Rows", each Table.AddColumn(_,
                "Start time",
                (a)=> 
                try
                    if a[Task ID] = [Task ID]{a[Index]-1} 
                    then null
                    else   
                        if a[Start time2] > [Finish time]{a[Index]-1} 
                        then 
                            if a[Start time2] < a[Finish time]
                            then a[Start time2] 
                            else [Finish time]{a[Index]-1}
                        else [Finish time]{a[Index]-1}
                otherwise 
                    a[Start time2])
            }
        }),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Added Start time", "Rows", {"Task ID", "Type", "Start time2", "Finish time", "Start time"}, {"Task ID", "Type", "Start time2", "Finish time", "Start time"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Expanded Rows",{"Car ID", "Task ID", "Type", "Finish time", "Start time"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"Car ID", "Task ID", "Type", "Start time", "Finish time"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Car ID", type text}, {"Task ID", Int64.Type}, {"Type", type text}, {"Start time", type datetime}, {"Finish time", type datetime}}),
    #"Filled Down" = Table.FillDown(#"Changed Type1",{"Start time"})
in
    #"Filled Down"

 (copy and paste in a blank query) 

Anonymous
Not applicable

It is remarkable!! It works!!!

Anonymous
Not applicable

@latimeria Thank you so much. Please let me have a look first, will try in a couple of hours and have to get familiar with m-language first

ppm1
Solution Sage
Solution Sage

While it is possible to do this in the query editor, it would be easier to do it in DAX calculated columns. Below is an example. It works to generate your desired start times for all but task 126 (I don't understand why that one needs to be updated). Hopefully you can adapt the logic to get what you need. A simpler expression can be made for the finished time.

 

ppm1_0-1668348514413.png

NewStartTime =
VAR vThisStart = CarTasks[Start time]
VAR vThisTaskID = CarTasks[Task ID]
VAR vMinStartThisTask =
    CALCULATE (
        MIN ( CarTasks[Start time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    )
VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )
VAR vMinStartOverall =
    MIN ( vThisStart, vMinStartThisTask )
VAR vResult =
    MAX ( vMaxPrevTaskFinish, vMinStartOverall )
RETURN
    vResult

 

Pat

 

Microsoft Employee
Anonymous
Not applicable


@ppm1 wrote:

 

 

VAR vThisTaskID = CarTasks[Task ID]

VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )

 

 


It is brilliant, I have never thought that DAX is so powerful till you showed me this simple and clear code!

 

But I am wondering why have to include "CarTasks[Task ID] < vThisTask"? As from the "VAR vThisTaskID = CarTasks[Task ID]", "CarTasks[Task ID] < vThisTask" will become "CarTasks[Task ID] < CarTasks[Task ID]".

 

And for the task 126, if it is not allowed to have "Start time" = "Finish time", otherwise, current "Start time" should be replaced by the last "Finish time". What should I add to the logic?

 

Many Thanks!

Here is some updated DAX for the two columns.

NewStartTime =
VAR vThisStart = CarTasks[Start time]
VAR vThisTaskID = CarTasks[Task ID]
VAR vMinStartThisTask =
    CALCULATE (
        MIN ( CarTasks[Start time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    )
VAR vMaxPrevTaskFinish =
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID] ),
        CarTasks[Task ID] < vThisTaskID
    )
VAR vMinStartOverall =
    MIN ( vThisStart, vMinStartThisTask )
VAR vResult =
    IF (
        CarTasks[Start time] = CarTasks[Finish time],
        vMaxPrevTaskFinish,
        MAX ( vMaxPrevTaskFinish, vMinStartOverall )
    )
RETURN
    vResult


NewFinishTime =
IF (
    CarTasks[Type] = "T",
    CALCULATE (
        MAX ( CarTasks[Finish time] ),
        ALLEXCEPT ( CarTasks, CarTasks[Car ID], CarTasks[Task ID] )
    ),
    CarTasks[Finish time]
)

 

Pat

Microsoft Employee
Anonymous
Not applicable

Thank you so much!!! And thanks for adding an additional "Finish time" column.

Anonymous
Not applicable

@ppm1 Thanks so much. Let me implement this logic tonight, in a few hours. I need to get familiar with the DAX first. 

 

To your question about task 126, it is because "start time" can't be the same as "finish time"...... probably I didn't make it clear in the beginning. 

lbendlin
Super User
Super User

 

Assume the row order is correct

 

Can't do that.  Add an index column at the earliest possible opportunity.

 

To fix it, it will take the "Finish time" of the last "Task ID" and replace the current "Start time". 

 

Please explain the rationale behind this rule.  Seems to be slightly weird. It also doesn't seem to match your indicated expected result.

 

Anonymous
Not applicable



@lbendlin wrote:

 

Please explain the rationale behind this rule.  Seems to be slightly weird. It also doesn't seem to match your indicated expected result.

 


Because the source is problematic at only the "Start time", but no fault at the "Finish time", provided that every task is in sequence, meaning no time overlapping between each other. 

 

So once I spot a fault in "Start time", the remedy is to modify the "Start time" by taking out the overlapping session. For those records without overlapping, their "Start time" would be remained unchanged. 

 

Is it possible to implement this idea in PowerBI PowerQuery without using python?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.