Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
_Emil_
Frequent Visitor

Getting Random Duplicates after Merge and Expand

I'm using Merge (left outer) on two fields with Expand column, my right side is unique (Check with SQL including case sensitive collation).

 

Before I expand I don't get duplicates, after I expand I get duplicates, however the duplicates rows are identical (using all columns from right side) and appear to be random as after refreshing the table I get different duplicates each time.

 

I have also sorted data in Power Query as final step, and when I look at the duplicated ID, it actually isn't duplicated, but is duplicated in Power BI Report.

 

Applying remove duplicates solves the problem, but I'm trying to understand why merge with expand would produce duplicates. The table is 100k+ rows.

 

Any ideas? 

 

Regards

Emil

 

7 REPLIES 7
vpechersky
New Member

Where you able to find a solution to this? I'm having this problem when merging a large CSV and a large XLS.

v-huizhn-msft
Microsoft Employee
Microsoft Employee

Hi @_Emil_,

The date in left side have duplicate, right? If it is, I try to reproduce the scenario, please review the following.

The first table is left side table, the second is right side table which is unique.

1.png2.png

I merge(left outer) on two tickets, and result as the screenshot below.

3.png4.png

From the result, I get duplicates after expand. This is because there are duplicates, the value in right site must matches them.

If you have any other issue, please feel free to ask.


Best Regards,
Angelia

Hi Angelia,

 

Let me clarify, using your example you have 5 rows on left side, and you get 5 rows as final result, in my case I get 5+ rows, even when right side is unique.

 

So my output is like this:

1. D, D

2. D, D

3. C, C

4. E, E

5. E, E <-- duplicated

6. F, NULL

 

and if I refresh data my duplication would be different like this:

1. D, D

2. D, D

3. C, C

4. C, C <-- duplicated

5. E, E 

6. F, NULL

 

I have this beheviour on several different merges now, in all cases right side is unique, so I don't expect to get more rows on left side, but I do. 

 

 

If I take my duplicated row and filter my left side (1 unique row) than I no longer get the duplication, and if I was doing something wrong (which is possible) I would expect it to be there duplicated.

 

I'll try to play with it a bit more to see if I can narrow down the issue.

 

Regards

Emil

Hi @_Emil_

For your requriement, it should change dynamically. We are unable to achieve your expected result. Thanks for your understanding.

 

Best Regards,
Angelia

Thanks, it seems to happen only with larger volume of data, and affects only very small % of rows, but appears with almost every merge I use that uses different logic, columns (no issues with SQL Left join).

 

I'll try to narrow it down and see if it is something on my side that I'm misunderstanding. 

 

Regards

Emil

 

_Emil_
Frequent Visitor

Hi 

 

I have managed to confirm it's an issue.

 

I have exported my "item" dimension to a CSV and "swapped" Power BI source. The issue goes away.

 

I put back SQL and issue returns and is still random.

 

On 10 runs (refreshes with same the same data) I get:

1) Left side duplicates + blanks

2) no issue

3) no issue

4) Item dimension contains duplicates (which it doesn't)

5) no issue

6) no issue

7) Left side duplicates + blanks

😎 Item dimension contains duplicates (which it doesn't)

9) No issue

10) No issue

 

* (blanks between 670+ to 50k out of 135k unique rows, duplicates between 5k to 30k depending on the run)

 

The only difference I see is different collation left side Latin1_General_CI_AS right side (item dim) Latin1_General_CI_AS

 

I have considered collation when I started having the issue but I don't understand what it is random.

 

I've tried using collate to match the other side (tried on both sides), but it didn't work.

 

Is there a collation setting in Power BI when SQL Server is used as a source?

 

Regards

Emil

Yes, I see the same thing using SQL Server as the source. It's also possible to connect to SQL Server via OLE DB (I haven't checked to see if this resolves the issue). I am just removing duplicates twice, as you said in your first post.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.