Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

60 Days of Data Days! Live and on-demand sessions, challenges, study groups and more! And it's all FREE!. Join now. Learn more

Reply
Yggdrasill
Responsive Resident
Responsive Resident

Remove duplicate values in Fabric Data Warehouse SQL table using Stored Procedure

After creating a Pipeline in Data Factory with success where I fetch data via REST API to Azure SQL Database I wanted to see if I could do the same within Microsoft Fabric and use the new (Synapse) Data Warehouse feature within Fabric.

All steps of my original pipeline work until the last step where I call a script which basically removes duplicate rows from the SQL table.

Yggdrasill_0-1699024637962.png

 



I moved this to Fabric like so

Yggdrasill_1-1699024690132.png

 



The error I receive from Fabric is on the last step:
The query processor could not produce a query plan because the target DML table is not hash partitioned.

I know why this is happening, technically, but we can't, as is, add hashing to the SQL tables "stored in Fabric"

I've tried running scripts and stored procedures but with and I've also created another table as an anchor to replicate the hashing method no luck so therefor I ask the community for advice.

How do you remove duplicate rows in (Synapse) Data Warehouse within Fabric?

1 ACCEPTED SOLUTION
Yggdrasill
Responsive Resident
Responsive Resident

This SP would work if I want to keep all unique rows but in my case I just want to remove duplicates on my key column [id] but with the highest value in [LastSyncedDate

Your method will still return duplicate id's

I "solved" this by creating a view in the data warehouse which removes the duplicates and then I just removed the last step of the pipeline and I query the view instead of the table...

View solution in original post

4 REPLIES 4
Yggdrasill
Responsive Resident
Responsive Resident

This SP would work if I want to keep all unique rows but in my case I just want to remove duplicates on my key column [id] but with the highest value in [LastSyncedDate

Your method will still return duplicate id's

I "solved" this by creating a view in the data warehouse which removes the duplicates and then I just removed the last step of the pipeline and I query the view instead of the table...

Anonymous
Not applicable

Hi @Yggdrasill ,
Thanks for using Fabric Community.
Can you please explain me how are you removing the duplicate rows? Are you using the DROP command?
If you are using the DROP command, currently this is not supported in Fabric Warehouse.

 

vnikhilanmsft_0-1699026501423.png

Hope this helps. Please let us know if you have any further questions.

 

I created a stored procedure 

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE RemoveDuplicates
AS
BEGIN
    WITH CTE AS (
        SELECT
            *,
            ROW_NUMBER() OVER (PARTITION BY id ORDER BY LastDate DESC) AS RowNum
        FROM
            dbo.MyTable
    )

    DELETE FROM CTE
    WHERE RowNum > 1;
END;




GO
Anonymous
Not applicable

Hi @Yggdrasill ,
I tried to create a repro with a work around by using the CTAS and the DISTINCT keyword in the stored procedure. I have attached the screenshots for your reference.

 

1) Created a stored procedure removeDuplicates .

vnikhilanmsft_0-1699036550864.png

2) The data in Allotment table is as follows:

vnikhilanmsft_2-1699036693951.png

 

 

3) Executed the stored procedure.

vnikhilanmsft_1-1699036627553.png

Try using this work around in your stored procedure. 

Hope this helps. Please let me know if you have any further questions.

 

Helpful resources

Announcements
Fabric Data Days is here Carousel

Data Days 2026

Don't miss out on Data Days, June 15 through August 7. Learn Fabric, Power BI, SQL, AI and more.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.