March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I'm using Merge (left outer) on two fields with Expand column, my right side is unique (Check with SQL including case sensitive collation).
Before I expand I don't get duplicates, after I expand I get duplicates, however the duplicates rows are identical (using all columns from right side) and appear to be random as after refreshing the table I get different duplicates each time.
I have also sorted data in Power Query as final step, and when I look at the duplicated ID, it actually isn't duplicated, but is duplicated in Power BI Report.
Applying remove duplicates solves the problem, but I'm trying to understand why merge with expand would produce duplicates. The table is 100k+ rows.
Any ideas?
Regards
Emil
Where you able to find a solution to this? I'm having this problem when merging a large CSV and a large XLS.
Hi @_Emil_,
The date in left side have duplicate, right? If it is, I try to reproduce the scenario, please review the following.
The first table is left side table, the second is right side table which is unique.
I merge(left outer) on two tickets, and result as the screenshot below.
From the result, I get duplicates after expand. This is because there are duplicates, the value in right site must matches them.
If you have any other issue, please feel free to ask.
Best Regards,
Angelia
Hi Angelia,
Let me clarify, using your example you have 5 rows on left side, and you get 5 rows as final result, in my case I get 5+ rows, even when right side is unique.
So my output is like this:
1. D, D
2. D, D
3. C, C
4. E, E
5. E, E <-- duplicated
6. F, NULL
and if I refresh data my duplication would be different like this:
1. D, D
2. D, D
3. C, C
4. C, C <-- duplicated
5. E, E
6. F, NULL
I have this beheviour on several different merges now, in all cases right side is unique, so I don't expect to get more rows on left side, but I do.
If I take my duplicated row and filter my left side (1 unique row) than I no longer get the duplication, and if I was doing something wrong (which is possible) I would expect it to be there duplicated.
I'll try to play with it a bit more to see if I can narrow down the issue.
Regards
Emil
Hi @_Emil_,
For your requriement, it should change dynamically. We are unable to achieve your expected result. Thanks for your understanding.
Best Regards,
Angelia
Thanks, it seems to happen only with larger volume of data, and affects only very small % of rows, but appears with almost every merge I use that uses different logic, columns (no issues with SQL Left join).
I'll try to narrow it down and see if it is something on my side that I'm misunderstanding.
Regards
Emil
Hi
I have managed to confirm it's an issue.
I have exported my "item" dimension to a CSV and "swapped" Power BI source. The issue goes away.
I put back SQL and issue returns and is still random.
On 10 runs (refreshes with same the same data) I get:
1) Left side duplicates + blanks
2) no issue
3) no issue
4) Item dimension contains duplicates (which it doesn't)
5) no issue
6) no issue
7) Left side duplicates + blanks
😎 Item dimension contains duplicates (which it doesn't)
9) No issue
10) No issue
* (blanks between 670+ to 50k out of 135k unique rows, duplicates between 5k to 30k depending on the run)
The only difference I see is different collation left side Latin1_General_CI_AS right side (item dim) Latin1_General_CI_AS
I have considered collation when I started having the issue but I don't understand what it is random.
I've tried using collate to match the other side (tried on both sides), but it didn't work.
Is there a collation setting in Power BI when SQL Server is used as a source?
Regards
Emil
Yes, I see the same thing using SQL Server as the source. It's also possible to connect to SQL Server via OLE DB (I haven't checked to see if this resolves the issue). I am just removing duplicates twice, as you said in your first post.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
119 | |
88 | |
73 | |
67 | |
49 |
User | Count |
---|---|
199 | |
140 | |
96 | |
77 | |
68 |