Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
_Smrithi
New Member

Issue with Self join in Dataflow Gen 2

I have a Self-join in one of my Power query Dataflow Gen2 tables. This is to determine gaps between rent conditions. The output of this table is written to a Lakehouse Data destination.

Refreshing the dataflow Gen 2 takes 6 hours and fails with a "Mashup Error".

Detailed Error message -> 'Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: Evaluation ran out of memory and can't continue. Details: '. Error code: EvaluationOutOfMemoryError.

 

1. I tried writing the intermediate result in a table and tried referencing this table to apply the rest of the transformations. DF2 refresh failed.

2. When I apply the filter and get only 1 cost centre in the table, DF2 refresh runs successfully.

 

Has anyone experiened this issue before? I know Power query Merge is not recommended. Can you please let me know if there are any alternate ways to achieve this without using Self-join?

 

Thanks,

Smrithi

1 ACCEPTED SOLUTION

 

Cool. Thanks for the update @_Smrithi .

It does seem strange to be getting this error on only 5M rows. Up the the point that you add the indices for self-join, is the query maintaining folding i.e. is 'View Data Source Query' lit up and selectable when you right-click the query step before adding the first index? The more of the query that you can get the source to process (assuming an SQL source) before doing your transformation should help with memory load.

There are other ways to achieve the same result which should avoid the huge amount of table scanning that a merge does. Here's one method that you can try:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUTIw1AciIwMjYyjHDMKJ1UHImyHLW6LKGxroAxFU3thQ39AIKh8LAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Valid From" = _t, #"Valid To" = _t]),
    Cols = Table.ToColumns(Source),
    addPrevValidTo = Table.FromColumns(Cols & {{null} & List.RemoveLastN(List.Last(Cols),1)}, Table.ColumnNames(Source) & {"PrevValidTo"})
in
    addPrevValidTo

 

The example code above transforms this:

BA_Pete_0-1738307195982.png

 

...to this:

BA_Pete_1-1738307224559.png

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

8 REPLIES 8
v-nmadadi-msft
Community Support
Community Support

Hi @_Smrithi ,

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the community members for the issue worked. If our responses addressed, please mark it as Accept as solution and click Yes if you found it helpful.

 

Thanks and regards

v-nmadadi-msft
Community Support
Community Support

Hi @_Smrithi 

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.


Thank you.

v-nmadadi-msft
Community Support
Community Support

Hi @_Smrithi 

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

BA_Pete
Super User
Super User

Hi @_Smrithi ,

Very difficult to help based on the information provided.

What source(s) are you using? What other transformations are you doing in the query? How many rows are in your table? What gateway are you using? What other refreshes are scheduled at the same time? Etc.

Please have a read through this and help us to help you:
https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/4381... 

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi @BA_Pete,

 

Thanks for the response. I have documented my requirements below. Would you take a look at it and suggest?

Note: My dataflow is in Dataflow Gen2.

Requirement:

I have a table with date fields “Valid from” and “Valid To”.

I aim to find the gaps between the “Valid From” of the current record and the “Valid To” of the previous record.

To achieve this, I am creating Indices on the table and Self-joining the table based on the indices.

When I refresh the dataflow as I have Self join, it takes too long to refresh and sometimes, dataflow refresh fails.

Can you please suggest any alternative to self-join to attain the same result?

 

Please see my answers to your questions,

What source(s) are you using? My source files are from LakeHouse.

 

What other transformations are you doing in the query? I have grouping, Indexing, custom calculated columns and Self-Join.

I have tried the performance of each step in the transformation. The step that causes performance issue is Self-Join.

 

How many rows are in your table? I have about 5 million rows in the table. 

 

What gateway are you using? I am connecting to Lakehouse which is in the same Workspace as my dataflow.

 

What other refreshes are scheduled at the same time? I am executing this dataflow on adhoc basis when there are no other dataflow refresh happens. 

 

Thanks,

Smrithi

 

 

Cool. Thanks for the update @_Smrithi .

It does seem strange to be getting this error on only 5M rows. Up the the point that you add the indices for self-join, is the query maintaining folding i.e. is 'View Data Source Query' lit up and selectable when you right-click the query step before adding the first index? The more of the query that you can get the source to process (assuming an SQL source) before doing your transformation should help with memory load.

There are other ways to achieve the same result which should avoid the huge amount of table scanning that a merge does. Here's one method that you can try:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUTIw1AciIwMjYyjHDMKJ1UHImyHLW6LKGxroAxFU3thQ39AIKh8LAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Valid From" = _t, #"Valid To" = _t]),
    Cols = Table.ToColumns(Source),
    addPrevValidTo = Table.FromColumns(Cols & {{null} & List.RemoveLastN(List.Last(Cols),1)}, Table.ColumnNames(Source) & {"PrevValidTo"})
in
    addPrevValidTo

 

The example code above transforms this:

BA_Pete_0-1738307195982.png

 

...to this:

BA_Pete_1-1738307224559.png

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




@BA_Pete Thanks for providing the solution. Would you let me know if the field, "ID" in the below table have different values? 

 

 KeyValid FromValid To
11/1/20251/5/2025
21/7/20251/10/2025
31/15/20251/20/2025
41/25/20251/30/2025

 

Thank you,

Smrithi

Hi @_Smrithi  ,
Thanks for reaching out to the Microsoft fabric community forum.
If you're asking whether the query works with a different ID, then yes, I have tested it, and it functions correctly.

vnmadadimsft_0-1738921848325.png

 


Hope the work around provided by @BA_Pete  was helpful.

If you find this post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS.
Thanks and Regards

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.