Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hello,
I can't find more info about this, I hope someone here can help me. I have a DB with a table with 44M rows and I need to remove duplicates from it with Power Query (can't do it in the DB). So I'm connecting to that source with a query embedded in the first step of the "Applied Steps" section, the "Source" one.
It would change from 44M to 200k rows so you can see the improvement.
The thing is, it doesn't seem possible to include a DISTINCT() clause to one field or the entire query.
I'm using MySQL by the way, I believe that is where the issue appears.
Thank you so much,
Solved! Go to Solution.
Table.Distinct or Table.Group can work around that date column (by not including it in either command).
You will have to check the performance. Ideally your Power Query transforms "fold" (can be converted back into SQL and pushed up into the data source).
You can use Table.Distinct() without specifying any column, with specifying all columns, or with specifying some columns. (The latter will produce random results).
Trying to deduplicate 44M rows in Power Query is ambitious. It will take a very long time.
can't do it in the DB
Please elaborate.
Thank you for your response,
Improving performance in refresh is what I am trying to achieve. I thought that deduplicating in the first step would be better than add another step of Table.Distinct()
Please elaborate.
Database table has a date column that cannot be removed, that column makes the duplicates to not being duplicates, but since I don't need that date column the rows are duplicate.
Table.Distinct or Table.Group can work around that date column (by not including it in either command).
You will have to check the performance. Ideally your Power Query transforms "fold" (can be converted back into SQL and pushed up into the data source).
Table.Distinct or Table.Group can work around that date column (by not including it in either command).
You will have to check the performance. Ideally your Power Query transforms "fold" (can be converted back into SQL and pushed up into the data source).
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Power BI update to learn about new features.