Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
fabriciofds
Frequent Visitor

Lead function, with partition by and order by in Power Query

Hello guys, i need help to replicate the same function i have in sql to power query. I need to bring the result of the previous row of the column "ORGANIZATION_UNIT_ID" partitioned by a cluster which in this case is the "GLOBAL_EMPLOYEE_ID" of the person and ordered by the date (LOAD DATE).

 

here is the expected result:

 

LOAD_DATEGLOBAL_EMPLOYEE_IDORGANIZATION_UNIT_IDLast_Organization_Unit_ID
31/10/202120001null
30/11/2021200011
31/12/2021200011
31/01/2022200021
28/02/2022200022
31/03/2022200022
30/04/2022200012
31/05/2022200011
30/06/2022200031
31/10/202110001null
30/11/2021100011
31/12/2021100021
31/01/2022100022
28/02/2022100032
31/03/2022100053
30/04/2022100055
31/05/2022100055
30/06/2022100055

 

This is the code I use in sql:

 

 

SELECT
  LOAD_DATE,
  ORGANIZATION_UNIT_ID,
  GLOBAL_EMPLOYEE_ID,
  LEAD(ORGANIZATION_UNIT_ID, 1)
  OVER (PARTITION BY GLOBAL_EMPLOYEE_ID
  ORDER BY LOAD_DATE DESC
  ) AS Last_Org_Unit_ID

FROM Table

 

 I've already tried to merge the table with it even using load date -1, but the mass of data I have is very large and makes the process of becoming a data flow unfeasible

 

If anyone has any ideas, I'd appreciate it.

8 REPLIES 8
watkinnc
Super User
Super User

You can write that SQL right into a dataflow.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!
watkinnc
Super User
Super User

I always find that writing the LEAD/LAG SQL is faster than any other way of doing previous row stuff.  If you are using PBI Desktop, you can add "EnableFolding = true" to the Value.NativeQuery parameter to keep the query folding.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Hi @watkinnc. I totally agree with you. In SQL to do this same treatment takes a maximum of 2 minutes. In power query, it takes hours. The problem with this and the reason is that in a few days the base that I do this in sql will be made available to me only through a data flow.  

watkinnc
Super User
Super User

You could also do the classic indexes starting from zero and then starting from one and then merge the table to itself.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!
Vijay_A_Verma
Super User
Super User

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dc9BDoAgDETRu7Am6UwR4l2I97+GSAxooSsWLxP6aw2JQohCGWJQAO1huGITCLmXtlFP0Dc6RbvoKdCtPJu0FwgOI/OfvJe2KUbSuHqU0i2lW0p79SilW0pzwSx9Ja+lf/mWLpuyynUD", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [LOAD_DATE = _t, GLOBAL_EMPLOYEE_ID = _t, ORGANIZATION_UNIT_ID = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"LOAD_DATE", type text}, {"GLOBAL_EMPLOYEE_ID", Int64.Type}, {"ORGANIZATION_UNIT_ID", Int64.Type}}),
    ListOfIDs = List.Buffer(#"Changed Type"[GLOBAL_EMPLOYEE_ID]),
    ListOfValues = List.Buffer(#"Changed Type"[ORGANIZATION_UNIT_ID]),
    GenList = List.Generate(()=>[x=null,i=0], each [i]<List.Count(ListOfValues), each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} then ListOfValues{[i]} else null)], each [x]),
    Result = Table.FromColumns(Table.ToColumns(#"Changed Type")&{GenList},Table.ColumnNames(#"Changed Type")&{"Last_Organization_Unit_ID"})
in
    Result

Hi @Vijay_A_VermaThanks a lot for the solution. It brought the expected result, but not in the required performance. I applied it to my dataset and it's been two hours processing the data flow.

 

I'm not an expert on the subject, but I've been trying to evolve from a grouping made by "Global ID". I can add an index inside the virtual table. The expected is that it was already possible to create a column referencing the previous result. But according to some tutorials I've seen this is only possible by referencing the previous step. (https://www.youtube.com/watch?v=UPddzZnsf5w)

 

Due to the complexity of the code you gave me, I'm sure you could do something like that. I believe it can be more performative for the data profile I'm working on (a few million lines) kkk.

 

here is the code i am using:

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dc9BDoAgDETRu7Am6UwR4l2I97+GSAxooSsWLxP6aw2JQohCGWJQAO1huGITCLmXtlFP0Dc6RbvoKdCtPJu0FwgOI/OfvJe2KUbSuHqU0i2lW0p79SilW0pzwSx9Ja+lf/mWLpuyynUD", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [LOAD_DATE = _t, GLOBAL_EMPLOYEE_ID = _t, ORGANIZATION_UNIT_ID = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"LOAD_DATE", type date}, {"GLOBAL_EMPLOYEE_ID", Int64.Type}, {"ORGANIZATION_UNIT_ID", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"GLOBAL_EMPLOYEE_ID"}, {{"Data", each _, type table [LOAD_DATE=nullable date, GLOBAL_EMPLOYEE_ID=nullable number, ORGANIZATION_UNIT_ID=nullable number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "DataIndex", each Table.AddIndexColumn([Data],"Index",1))
in
    #"Added Custom"

 

 

 

 

If I could do something along those lines, that would be great.

 

in advance, thank you very much

Hi Vijay,

This is great code. Can you explain what's going on in this code?

List.Generate(()=>[x=null,i=0], each [i]<List.Count(ListOfValues), each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} then ListOfValues{[i]} else null)], each [x])

What I undestood is:

()=>[x=null,i=0] - this creates record object with values x=null and i=0

each [i]<List.Count(ListOfValues) - this checks if iteration is smaller then number of items in list

each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} - this creates record with column i equal i + 1 and column x - here I don't understand what is happening. ListOfIDs{i} means get element i from list ListOfIDs but what does it does ListOfIDs{[i]} ?

last part returns x calculated earlier.

Thanks in advance for your explanantion.

Artur

i = [i]+1, hence in ListOfIDs{i}=ListOfIDs{[i]}, I can also use ListOfIDs{i}=ListOfIDs{i-1} => I am checking wether current value of list is = previous value of list or not. If it is equal, it means same group continues. If it is unequal, then a new group has started. If a new group is started, then result should be null. If within same group, then value is ListOfValues{[i]} where [i] means i-1. So I could have written it ListOfValues{i-1} as well. 

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors
Top Kudoed Authors