The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hello guys, i need help to replicate the same function i have in sql to power query. I need to bring the result of the previous row of the column "ORGANIZATION_UNIT_ID" partitioned by a cluster which in this case is the "GLOBAL_EMPLOYEE_ID" of the person and ordered by the date (LOAD DATE).
here is the expected result:
LOAD_DATE | GLOBAL_EMPLOYEE_ID | ORGANIZATION_UNIT_ID | Last_Organization_Unit_ID |
31/10/2021 | 2000 | 1 | null |
30/11/2021 | 2000 | 1 | 1 |
31/12/2021 | 2000 | 1 | 1 |
31/01/2022 | 2000 | 2 | 1 |
28/02/2022 | 2000 | 2 | 2 |
31/03/2022 | 2000 | 2 | 2 |
30/04/2022 | 2000 | 1 | 2 |
31/05/2022 | 2000 | 1 | 1 |
30/06/2022 | 2000 | 3 | 1 |
31/10/2021 | 1000 | 1 | null |
30/11/2021 | 1000 | 1 | 1 |
31/12/2021 | 1000 | 2 | 1 |
31/01/2022 | 1000 | 2 | 2 |
28/02/2022 | 1000 | 3 | 2 |
31/03/2022 | 1000 | 5 | 3 |
30/04/2022 | 1000 | 5 | 5 |
31/05/2022 | 1000 | 5 | 5 |
30/06/2022 | 1000 | 5 | 5 |
This is the code I use in sql:
SELECT
LOAD_DATE,
ORGANIZATION_UNIT_ID,
GLOBAL_EMPLOYEE_ID,
LEAD(ORGANIZATION_UNIT_ID, 1)
OVER (PARTITION BY GLOBAL_EMPLOYEE_ID
ORDER BY LOAD_DATE DESC
) AS Last_Org_Unit_ID
FROM Table
I've already tried to merge the table with it even using load date -1, but the mass of data I have is very large and makes the process of becoming a data flow unfeasible
If anyone has any ideas, I'd appreciate it.
You can write that SQL right into a dataflow.
--Nate
I always find that writing the LEAD/LAG SQL is faster than any other way of doing previous row stuff. If you are using PBI Desktop, you can add "EnableFolding = true" to the Value.NativeQuery parameter to keep the query folding.
--Nate
Hi @watkinnc. I totally agree with you. In SQL to do this same treatment takes a maximum of 2 minutes. In power query, it takes hours. The problem with this and the reason is that in a few days the base that I do this in sql will be made available to me only through a data flow.
You could also do the classic indexes starting from zero and then starting from one and then merge the table to itself.
--Nate
See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dc9BDoAgDETRu7Am6UwR4l2I97+GSAxooSsWLxP6aw2JQohCGWJQAO1huGITCLmXtlFP0Dc6RbvoKdCtPJu0FwgOI/OfvJe2KUbSuHqU0i2lW0p79SilW0pzwSx9Ja+lf/mWLpuyynUD", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [LOAD_DATE = _t, GLOBAL_EMPLOYEE_ID = _t, ORGANIZATION_UNIT_ID = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"LOAD_DATE", type text}, {"GLOBAL_EMPLOYEE_ID", Int64.Type}, {"ORGANIZATION_UNIT_ID", Int64.Type}}),
ListOfIDs = List.Buffer(#"Changed Type"[GLOBAL_EMPLOYEE_ID]),
ListOfValues = List.Buffer(#"Changed Type"[ORGANIZATION_UNIT_ID]),
GenList = List.Generate(()=>[x=null,i=0], each [i]<List.Count(ListOfValues), each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} then ListOfValues{[i]} else null)], each [x]),
Result = Table.FromColumns(Table.ToColumns(#"Changed Type")&{GenList},Table.ColumnNames(#"Changed Type")&{"Last_Organization_Unit_ID"})
in
Result
Hi @Vijay_A_Verma. Thanks a lot for the solution. It brought the expected result, but not in the required performance. I applied it to my dataset and it's been two hours processing the data flow.
I'm not an expert on the subject, but I've been trying to evolve from a grouping made by "Global ID". I can add an index inside the virtual table. The expected is that it was already possible to create a column referencing the previous result. But according to some tutorials I've seen this is only possible by referencing the previous step. (https://www.youtube.com/watch?v=UPddzZnsf5w)
Due to the complexity of the code you gave me, I'm sure you could do something like that. I believe it can be more performative for the data profile I'm working on (a few million lines) kkk.
here is the code i am using:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dc9BDoAgDETRu7Am6UwR4l2I97+GSAxooSsWLxP6aw2JQohCGWJQAO1huGITCLmXtlFP0Dc6RbvoKdCtPJu0FwgOI/OfvJe2KUbSuHqU0i2lW0p79SilW0pzwSx9Ja+lf/mWLpuyynUD", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [LOAD_DATE = _t, GLOBAL_EMPLOYEE_ID = _t, ORGANIZATION_UNIT_ID = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"LOAD_DATE", type date}, {"GLOBAL_EMPLOYEE_ID", Int64.Type}, {"ORGANIZATION_UNIT_ID", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"GLOBAL_EMPLOYEE_ID"}, {{"Data", each _, type table [LOAD_DATE=nullable date, GLOBAL_EMPLOYEE_ID=nullable number, ORGANIZATION_UNIT_ID=nullable number]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "DataIndex", each Table.AddIndexColumn([Data],"Index",1))
in
#"Added Custom"
If I could do something along those lines, that would be great.
in advance, thank you very much
Hi Vijay,
This is great code. Can you explain what's going on in this code?
List.Generate(()=>[x=null,i=0], each [i]<List.Count(ListOfValues), each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} then ListOfValues{[i]} else null)], each [x])
What I undestood is:
()=>[x=null,i=0] - this creates record object with values x=null and i=0
each [i]<List.Count(ListOfValues) - this checks if iteration is smaller then number of items in list
each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} - this creates record with column i equal i + 1 and column x - here I don't understand what is happening. ListOfIDs{i} means get element i from list ListOfIDs but what does it does ListOfIDs{[i]} ?
last part returns x calculated earlier.
Thanks in advance for your explanantion.
Artur
i = [i]+1, hence in ListOfIDs{i}=ListOfIDs{[i]}, I can also use ListOfIDs{i}=ListOfIDs{i-1} => I am checking wether current value of list is = previous value of list or not. If it is equal, it means same group continues. If it is unequal, then a new group has started. If a new group is started, then result should be null. If within same group, then value is ListOfValues{[i]} where [i] means i-1. So I could have written it ListOfValues{i-1} as well.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.