Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
Atlsql
Frequent Visitor

Change in Index After Expanding Merged Table Columns.

I cannot explain why this is happening. 

I have two tables that I've created Indexes on, and am attempting to merge them based on Index = Index -1.   This all works great, and the indexing seems to be in line with my expectations before I expand the columns in the merge.

In this first example I have date times starting on 1/12/2021 at 12:01, and proceeding in order.

Atlsql_1-1610730416658.png


After I expand the columns, I get a change in Index Values, which is easily apparent by referencing the dates displayed.   While my first value still starts at 12:01, my new second value is the same date, but now at 6:51pm.

Atlsql_2-1610730467737.png

 

Has anyone else experienced this, and come up with a way to solve for it?  I literally thought I was losing my mind when the order kept changing ...

Advanced Editor Code for Reference:
let
Source = MyDatabase
dbo_FctDialPadOnDutyStats = Source{[Schema="dbo",Item="FctDialPadOnDutyStats"]}[Data],
#"Removed Columns" = Table.RemoveColumns(dbo_FctDialPadOnDutyStats,{"dimCallCenters", "dimDialPadStatus(AvailabilityStatus)", "dimDialPadStatus(OnDutyStatusId)", "dimDialPadTargetTypes", "dimEmails"}),
#"Inserted Date" = Table.AddColumn(#"Removed Columns", "Date", each Date.From([StatusDate]), type date),
#"Sort_Order" = Table.Sort(#"Inserted Date",{"StatusDate", Order.Ascending}),
#"Grouped Rows" = Table.Group(Sort_Order, {"EmailId", "Date", "CallCenterId"}, {{"Data", each _, type table [RecordId=number, CallCenterId=number, EmailId=number, OnDutyStatusId=number, AvailabilityStatus=number, TargetId=nullable number, Name=nullable text, StatusDate=nullable datetime, TargetTypeId=number, Date=date]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Partitioned", each Table.AddIndexColumn([Data], "Index", 1, 1)),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Data"}),
#"Expanded Partitioned" = Table.ExpandTableColumn(#"Removed Columns1", "Partitioned", {"RecordId", "EmailId", "OnDutyStatusId", "AvailabilityStatus", "TargetId", "Name", "StatusDate", "TargetTypeId", "Date", "Index"}, {"Partitioned.RecordId", "Partitioned.EmailId", "Partitioned.OnDutyStatusId", "Partitioned.AvailabilityStatus", "Partitioned.TargetId", "Partitioned.Name", "Partitioned.StatusDate", "Partitioned.TargetTypeId", "Partitioned.Date", "Partitioned.Index"}),
#"Removed Columns2" = Table.RemoveColumns(#"Expanded Partitioned",{"Partitioned.EmailId", "Partitioned.Date"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns2",{{"Partitioned.RecordId", "RecordId"}, {"Partitioned.OnDutyStatusId", "OnDutyStatusId"}, {"Partitioned.AvailabilityStatus", "AvailabilityStatus"}, {"Partitioned.TargetId", "TargetId"}, {"Partitioned.Name", "Name"}, {"Partitioned.StatusDate", "StatusDate"}, {"Partitioned.TargetTypeId", "TargetTypeId"}, {"Partitioned.Index", "RowNumber"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"RowNumber", Int64.Type}, {"CallCenterId", Int64.Type}, {"RecordId", Int64.Type}, {"TargetId", Int64.Type}, {"OnDutyStatusId", Int64.Type}, {"AvailabilityStatus", Int64.Type}, {"StatusDate", type datetime}}),
#"Reordered Columns" = Table.ReorderColumns(#"Changed Type",{"RowNumber", "EmailId", "CallCenterId", "Date", "RecordId", "OnDutyStatusId", "AvailabilityStatus", "TargetId", "Name", "StatusDate", "TargetTypeId"}),
#"Merged Queries" = Table.NestedJoin(#"Reordered Columns", {"RowNumber", "EmailId", "CallCenterId", "Date", "TargetId"}, Nextrow, {"RN-1", "EmailId", "CallCenterId", "Date", "TargetId"}, "Nextrow", JoinKind.LeftOuter),
#"Filtered Rows" = Table.SelectRows(#"Merged Queries", each ([EmailId] = 135) and ([Date] = #date(2021, 1, 12))),
#"Removed Columns3" = Table.RemoveColumns(#"Filtered Rows",{"Name"}),
#"Expanded Nextrow" = Table.ExpandTableColumn(#"Removed Columns3", "Nextrow", {"RecordId", "OnDutyStatusId", "AvailabilityStatus", "StatusDate"}, {"Nextrow.RecordId", "Nextrow.OnDutyStatusId", "Nextrow.AvailabilityStatus", "Nextrow.StatusDate"})
in
#"Expanded Nextrow"

1 ACCEPTED SOLUTION
edhans
Super User
Super User

Before you do the merge, wrap both tables in Table.Buffer() and see if that helps. To do that:

  • Go to each table
  • Go to the last step
  • Go to the formula bar. Whatever is there, add "Table.Buffer(" on the front, and another ")" on the end.
  • Repeat with 2nd table
  • then do the merge.

What I suspect is happening is your index is based on how data is sorted, and sorts in Power Query can be affected by other data, like data coming from a merge. Table.Buffer protects from that. It also breaks any Query folding, so if you are connecting to a SQL DB or other RDBM, be aware of that potential performance hit.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

11 REPLIES 11
agusmba
Advocate II
Advocate II

I don't know why this fixes the issue but it apparently does. Thank you!

It's quite counterintuitive if table A has created an index column, and table B is based on A but they end up with different values in the index column (scrambled).

It prevents external changes from affecting the query. Power Query doesn't always evaluate the transformation steps in the order you create them. It may do them in a different order to optimize performance. Things like merges can affect the order of a table, which is usually fine, unless building indexes. Table.Buffer prevents that, but at a cost. That table has to be loaded into memory. If it is large, it can slow your query down, or even halt the progress if the table is too large for the RAM of the machine running the query.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Thanks for the explanation although even if the order of the steps is not the one we define, the index column is only created once, so it shouldn't have different values on two tables when one just adds a few steps to another, or is merged with it based on common columns.

 

Anyway using Table.buffer() as a workaround is viable, but I wonder whether there is a better way to normalize a big flat table.

 

If table A is the original table, I'm currently basing table M (master) on A, dropping the value columns, keeping the "master" data, using distinct to remove duplicate master data, and creating an index column.

Then I base table D (detail) on A again, and join it with M based on the "master" columns, in order to get the M.index column and thus be able to link M and D in the model. Drop the master columns on D and keep the value colums and the index reference.

This fails randomly after a refresh (there is no error but the values make no sense as the index is different on each table) and only consistently works by using Table.buffer before the merge.

 

Is there a better way of dealing with this?

 

Thank you!

I know the index is only created once, but it may not be created when you think it is. Table.Buffer ensures it is.

The best solution is do this earlier than Power Query. If Table.Buffer works for you that is great, but if your table has 100M records, it won't work and a SQL Server or other data warehouse should be adding the indexes.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Thank you @edhans for your patience, explanations, and workaround!

 

I just wanted to point out that following the transformation steps I mentioned above, even if we don't know "when" the index is created, as long as it is created only once, there shouldn't be any problem. The fact that we sometimes end up with scrambled indexes in both tables (with a common ancestor where the index is supposedly created) seems to indicate that the index is actually being created twice, once for each one of the two tables, which could then result in non-matching indexes.

That is quite counterintuitive when you base your transformations on the principle of causality.

 

I agree with you that ideally the data should come already normalized from the backend, but sometimes you have to work with long flat tables in PQ and need to normalize them a bit in order to have a manageable model. As you mentioned if the tables grow too much there might not be an alternative to normalizing on the backend.

 

Since I haven found much information about this anywhere except on this thread, it's probably not identified as a bug yet, so it may take a long time to fix.

Thankfully we can work around it with your Buffer recommendation (even if that may cause other problems).

 

Cheers!

It is not a bug. I can assure you of that. It might not work like you want it to, but it isn't a bug. Read this article and watch this video by Ben Gribaudo. You'll understand the data in Power Query is transient, and it is never all of the data like you have with a relational database. Until you use Table.Buffer. But you can hit memory issues depending on the size of your table.

How Power Query Thinks: Taking the Mystery Out of Streaming and Query Folding (Video) | Ben Gribaudo



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
edhans
Super User
Super User

Great @Atlsql - glad it worked!



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
edhans
Super User
Super User

Before you do the merge, wrap both tables in Table.Buffer() and see if that helps. To do that:

  • Go to each table
  • Go to the last step
  • Go to the formula bar. Whatever is there, add "Table.Buffer(" on the front, and another ")" on the end.
  • Repeat with 2nd table
  • then do the merge.

What I suspect is happening is your index is based on how data is sorted, and sorts in Power Query can be affected by other data, like data coming from a merge. Table.Buffer protects from that. It also breaks any Query folding, so if you are connecting to a SQL DB or other RDBM, be aware of that potential performance hit.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Your Solution worked for me.

Now on expanding the merged table. Index column values are as expected.

Life-saving thank you so much

Atlsql
Frequent Visitor

That worked!  Thanks for the assist.

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.