Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
DevLearner0
Frequent Visitor

MYSQL -- DISTINCT clause in sql query

Hello,

 

I can't find more info about this, I hope someone here can help me. I have a DB with a table with 44M rows and I need to remove duplicates from it with Power Query (can't do it in the DB). So I'm connecting to that source with a query embedded in the first step of the "Applied Steps" section, the "Source" one.

It would change from 44M to 200k rows so you can see the improvement.

 

The thing is, it doesn't seem possible to include a DISTINCT() clause to one field or the entire query.

 

I'm using MySQL by the way, I believe that is where the issue appears.

 

Thank you so much,

1 ACCEPTED SOLUTION

Table.Distinct or Table.Group can work around that date column (by not including it in either command).

 

You will have to check the performance.  Ideally your Power Query transforms "fold"  (can be converted back into SQL and pushed up into the data source).

View solution in original post

4 REPLIES 4
lbendlin
Super User
Super User

You can use Table.Distinct() without specifying any column, with specifying all columns, or with specifying some columns.  (The latter will produce random results).

 

Trying to deduplicate 44M rows in Power Query is ambitious. It will take a very long time.

 

can't do it in the DB

Please elaborate.

Thank you for your response,

Improving performance in refresh is what I am trying to achieve. I thought that deduplicating in the first step would be better than add another step of Table.Distinct()

 


Please elaborate.


Database table has a date column that cannot be removed, that column makes the duplicates to not being duplicates, but since I don't need that date column the rows are duplicate.

 

 


 

Table.Distinct or Table.Group can work around that date column (by not including it in either command).

 

You will have to check the performance.  Ideally your Power Query transforms "fold"  (can be converted back into SQL and pushed up into the data source).

Table.Distinct or Table.Group can work around that date column (by not including it in either command).

 

You will have to check the performance.  Ideally your Power Query transforms "fold"  (can be converted back into SQL and pushed up into the data source).

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors