Solved: Re: Change in Index After Expanding Merged Table C...

Atlsql · ‎01-15-2021

I cannot explain why this is happening.

I have two tables that I've created Indexes on, and am attempting to merge them based on Index = Index -1. This all works great, and the indexing seems to be in line with my expectations before I expand the columns in the merge.

In this first example I have date times starting on 1/12/2021 at 12:01, and proceeding in order.

After I expand the columns, I get a change in Index Values, which is easily apparent by referencing the dates displayed. While my first value still starts at 12:01, my new second value is the same date, but now at 6:51pm.

Has anyone else experienced this, and come up with a way to solve for it? I literally thought I was losing my mind when the order kept changing ...

Advanced Editor Code for Reference:
let
Source = MyDatabase
dbo_FctDialPadOnDutyStats = Source{[Schema="dbo",Item="FctDialPadOnDutyStats"]}[Data],
#"Removed Columns" = Table.RemoveColumns(dbo_FctDialPadOnDutyStats,{"dimCallCenters", "dimDialPadStatus(AvailabilityStatus)", "dimDialPadStatus(OnDutyStatusId)", "dimDialPadTargetTypes", "dimEmails"}),
#"Inserted Date" = Table.AddColumn(#"Removed Columns", "Date", each Date.From([StatusDate]), type date),
#"Sort_Order" = Table.Sort(#"Inserted Date",{"StatusDate", Order.Ascending}),
#"Grouped Rows" = Table.Group(Sort_Order, {"EmailId", "Date", "CallCenterId"}, {{"Data", each _, type table [RecordId=number, CallCenterId=number, EmailId=number, OnDutyStatusId=number, AvailabilityStatus=number, TargetId=nullable number, Name=nullable text, StatusDate=nullable datetime, TargetTypeId=number, Date=date]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Partitioned", each Table.AddIndexColumn([Data], "Index", 1, 1)),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Data"}),
#"Expanded Partitioned" = Table.ExpandTableColumn(#"Removed Columns1", "Partitioned", {"RecordId", "EmailId", "OnDutyStatusId", "AvailabilityStatus", "TargetId", "Name", "StatusDate", "TargetTypeId", "Date", "Index"}, {"Partitioned.RecordId", "Partitioned.EmailId", "Partitioned.OnDutyStatusId", "Partitioned.AvailabilityStatus", "Partitioned.TargetId", "Partitioned.Name", "Partitioned.StatusDate", "Partitioned.TargetTypeId", "Partitioned.Date", "Partitioned.Index"}),
#"Removed Columns2" = Table.RemoveColumns(#"Expanded Partitioned",{"Partitioned.EmailId", "Partitioned.Date"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns2",{{"Partitioned.RecordId", "RecordId"}, {"Partitioned.OnDutyStatusId", "OnDutyStatusId"}, {"Partitioned.AvailabilityStatus", "AvailabilityStatus"}, {"Partitioned.TargetId", "TargetId"}, {"Partitioned.Name", "Name"}, {"Partitioned.StatusDate", "StatusDate"}, {"Partitioned.TargetTypeId", "TargetTypeId"}, {"Partitioned.Index", "RowNumber"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"RowNumber", Int64.Type}, {"CallCenterId", Int64.Type}, {"RecordId", Int64.Type}, {"TargetId", Int64.Type}, {"OnDutyStatusId", Int64.Type}, {"AvailabilityStatus", Int64.Type}, {"StatusDate", type datetime}}),
#"Reordered Columns" = Table.ReorderColumns(#"Changed Type",{"RowNumber", "EmailId", "CallCenterId", "Date", "RecordId", "OnDutyStatusId", "AvailabilityStatus", "TargetId", "Name", "StatusDate", "TargetTypeId"}),
#"Merged Queries" = Table.NestedJoin(#"Reordered Columns", {"RowNumber", "EmailId", "CallCenterId", "Date", "TargetId"}, Nextrow, {"RN-1", "EmailId", "CallCenterId", "Date", "TargetId"}, "Nextrow", JoinKind.LeftOuter),
#"Filtered Rows" = Table.SelectRows(#"Merged Queries", each ([EmailId] = 135) and ([Date] = #date(2021, 1, 12))),
#"Removed Columns3" = Table.RemoveColumns(#"Filtered Rows",{"Name"}),
#"Expanded Nextrow" = Table.ExpandTableColumn(#"Removed Columns3", "Nextrow", {"RecordId", "OnDutyStatusId", "AvailabilityStatus", "StatusDate"}, {"Nextrow.RecordId", "Nextrow.OnDutyStatusId", "Nextrow.AvailabilityStatus", "Nextrow.StatusDate"})
in
#"Expanded Nextrow"

edhans · ‎01-15-2021

Before you do the merge, wrap both tables in Table.Buffer() and see if that helps. To do that:

Go to each table
Go to the last step
Go to the formula bar. Whatever is there, add "Table.Buffer(" on the front, and another ")" on the end.
Repeat with 2nd table
then do the merge.

What I suspect is happening is your index is based on how data is sorted, and sorts in Power Query can be affected by other data, like data coming from a merge. Table.Buffer protects from that. It also breaks any Query folding, so if you are connecting to a SQL DB or other RDBM, be aware of that potential performance hit.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

agusmba · ‎06-21-2023

I don't know why this fixes the issue but it apparently does. Thank you!

It's quite counterintuitive if table A has created an index column, and table B is based on A but they end up with different values in the index column (scrambled).

edhans · ‎06-21-2023

It prevents external changes from affecting the query. Power Query doesn't always evaluate the transformation steps in the order you create them. It may do them in a different order to optimize performance. Things like merges can affect the order of a table, which is usually fine, unless building indexes. Table.Buffer prevents that, but at a cost. That table has to be loaded into memory. If it is large, it can slow your query down, or even halt the progress if the table is too large for the RAM of the machine running the query.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

agusmba · ‎06-21-2023

Thanks for the explanation although even if the order of the steps is not the one we define, the index column is only created once, so it shouldn't have different values on two tables when one just adds a few steps to another, or is merged with it based on common columns.

Anyway using Table.buffer() as a workaround is viable, but I wonder whether there is a better way to normalize a big flat table.

If table A is the original table, I'm currently basing table M (master) on A, dropping the value columns, keeping the "master" data, using distinct to remove duplicate master data, and creating an index column.

Then I base table D (detail) on A again, and join it with M based on the "master" columns, in order to get the M.index column and thus be able to link M and D in the model. Drop the master columns on D and keep the value colums and the index reference.

This fails randomly after a refresh (there is no error but the values make no sense as the index is different on each table) and only consistently works by using Table.buffer before the merge.

Is there a better way of dealing with this?

Thank you!

edhans · ‎06-21-2023

I know the index is only created once, but it may not be created when you think it is. Table.Buffer ensures it is.

The best solution is do this earlier than Power Query. If Table.Buffer works for you that is great, but if your table has 100M records, it won't work and a SQL Server or other data warehouse should be adding the indexes.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

agusmba · ‎06-28-2023

Thank you @edhans for your patience, explanations, and workaround!

I just wanted to point out that following the transformation steps I mentioned above, even if we don't know "when" the index is created, as long as it is created only once, there shouldn't be any problem. The fact that we sometimes end up with scrambled indexes in both tables (with a common ancestor where the index is supposedly created) seems to indicate that the index is actually being created twice, once for each one of the two tables, which could then result in non-matching indexes.

That is quite counterintuitive when you base your transformations on the principle of causality.

I agree with you that ideally the data should come already normalized from the backend, but sometimes you have to work with long flat tables in PQ and need to normalize them a bit in order to have a manageable model. As you mentioned if the tables grow too much there might not be an alternative to normalizing on the backend.

Since I haven found much information about this anywhere except on this thread, it's probably not identified as a bug yet, so it may take a long time to fix.

Thankfully we can work around it with your Buffer recommendation (even if that may cause other problems).

Cheers!

edhans · ‎06-28-2023

It is not a bug. I can assure you of that. It might not work like you want it to, but it isn't a bug. Read this article and watch this video by Ben Gribaudo. You'll understand the data in Power Query is transient, and it is never all of the data like you have with a relational database. Until you use Table.Buffer. But you can hit memory issues depending on the size of your table.

How Power Query Thinks: Taking the Mystery Out of Streaming and Query Folding (Video) | Ben Gribaudo

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

edhans · ‎01-15-2021

Great @Atlsql - glad it worked!

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

edhans · ‎01-15-2021

Before you do the merge, wrap both tables in Table.Buffer() and see if that helps. To do that:

Go to each table
Go to the last step
Go to the formula bar. Whatever is there, add "Table.Buffer(" on the front, and another ")" on the end.
Repeat with 2nd table
then do the merge.

What I suspect is happening is your index is based on how data is sorted, and sorts in Power Query can be affected by other data, like data coming from a merge. Table.Buffer protects from that. It also breaks any Query folding, so if you are connecting to a SQL DB or other RDBM, be aware of that potential performance hit.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

Anonymous · ‎01-21-2025

@edhans Could you give additional insight on this as well as when is the best time to add the Table.Buffer step? I'm having the same thing happen, and I sort the data by Ascending Created Date right when I load it in from the data warehouse. Do I have to add a Sort/Buffer step after every merge? Just at the end? It's almost like its shifting and/or re-evaluating.

Chenp · ‎01-16-2025

Thank you so much! This saved me!

HitendraD · ‎03-20-2024

Your Solution worked for me.

Now on expanding the merged table. Index column values are as expected.

GusGarofalo · ‎01-13-2023

Life-saving thank you so much

Atlsql · ‎01-15-2021

That worked! Thanks for the assist.

Change in Index After Expanding Merged Table Columns.

Helpful resources

Power BI Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Change in Index After Expanding Merged Table Columns.

Helpful resources

Power BI Monthly Update - September 2025

FabCon Atlanta 2026