Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

PowerQuery - Merge Queries Transformation Bug - Inconsistent Data After Each Refresh

I encounter a very strange issue/behavior in PBI, which I suspect has something to do with the way merge transformation is used in PowerQuery.

 

Basically, I extract data from SQL Server database to PBI tables/queries. There are 5 queries in total (1 for Date, 3 "dimensions", and 1 "fact").

The source tables do not have obvious key field in them, I concatenate several fields to make up a key field.

Also, I added Index to the table/query in PowerQuery (via Add Column > Index Column).

I did this for the 3 "dimension" tables.

 

Then, for the "fact" table, I performed Merged Queries transformations (LEFT Outer with no fuzzy matching) with the 3 "dimension" tables based on the combination of fields which form the key fields.

From this transformation, I can then obtain the Index field (act like a surrogate key now) from each of the "dimension" tables, so that it replicates a proper star-schema / fact-dimension data model, which I believe is the best practice.

This is the approach that I follow https://radacad.com/build-your-first-star-schema-model-in-action-power-bi-modeling-basics

 

It was all working fine, until I performed data refresh.

Every time I refresh the data, even though the souce has not changed, the data in Power BI desktop keeps changing.

Looking at the Data tab, it seems that the Index keys in the fact table are out of whack, i.e. they dont correspond to the correct Index field value in the dimension tables.

This behaviour is also observed when I refresh data in Power BI Report Server and Power BI Service.

 

One thing that seems to "fix" it is to disable parallel loading in Power BI Desktop (File >Options and Settings > Options > Current File > Data Load > Parallel loading of tables > untick Enable parallel loading of tables).

It makes the data refresh/loading slow, but it seems that the resulting data does not get out of whack after it completes.

This only applies on Power BI Desktop though, i.e. even if I disable parallel loading in PBI Desktop, once it is published to PBIRS or PBI Service, the issue is still there.

 

Another thing that seems to "fix" the issue but only in PowerBI Desktop is as suggested by @ImkeF user in https://community.powerbi.com/t5/Power-Query/Inconsistent-Data-After-Refresh-Merge-Query-Issue/m-p/1... adding Table.Buffer at the end of PowerQuery steps.

 

The workaround:

Since the issue seems to be caused by PowerBI/PowerQuery incorrectly retrieves the Index field values after Merged Queries transformation in the fact table, I removed all the relationships which use these key/index fields.

Instead, I now use the concatenation of multiple fields to form the keys on each tables and join them in the Relationship.

By doing this, no issue with data getting out of whack - PBI Desktop and PBI Service data refresh seem fine.

 

Has anyone had experienced this before?

Is this a known bug?

Status: New
Comments
v-chuncz-msft
Community Support

@Anonymous 

 

Thanks for sharing the workaround. To get faster and better technical support for this issue, you may directly create a support ticket.

rargyle
Advocate II

think  I have found a solution by the power of SQL query. Instead of adding an index column in PowerQuery, after getting the data, I'm using ROW_NUMBER in my database SQL:

 

SELECT ROW_NUMBER() OVER( ORDER BY Column1, Column2) id_Num, Column1, Column2 [...] 

 

So far my indexes have behaved on refreshing in the PowerBI Service. Hopefully it stays that way...!