Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
After creating a Pipeline in Data Factory with success where I fetch data via REST API to Azure SQL Database I wanted to see if I could do the same within Microsoft Fabric and use the new (Synapse) Data Warehouse feature within Fabric.
All steps of my original pipeline work until the last step where I call a script which basically removes duplicate rows from the SQL table.
I moved this to Fabric like so
The error I receive from Fabric is on the last step:
The query processor could not produce a query plan because the target DML table is not hash partitioned.
I know why this is happening, technically, but we can't, as is, add hashing to the SQL tables "stored in Fabric"
I've tried running scripts and stored procedures but with and I've also created another table as an anchor to replicate the hashing method no luck so therefor I ask the community for advice.
How do you remove duplicate rows in (Synapse) Data Warehouse within Fabric?
Solved! Go to Solution.
This SP would work if I want to keep all unique rows but in my case I just want to remove duplicates on my key column [id] but with the highest value in [LastSyncedDate]
Your method will still return duplicate id's
I "solved" this by creating a view in the data warehouse which removes the duplicates and then I just removed the last step of the pipeline and I query the view instead of the table...
This SP would work if I want to keep all unique rows but in my case I just want to remove duplicates on my key column [id] but with the highest value in [LastSyncedDate]
Your method will still return duplicate id's
I "solved" this by creating a view in the data warehouse which removes the duplicates and then I just removed the last step of the pipeline and I query the view instead of the table...
Hi @Yggdrasill ,
Thanks for using Fabric Community.
Can you please explain me how are you removing the duplicate rows? Are you using the DROP command?
If you are using the DROP command, currently this is not supported in Fabric Warehouse.
Hope this helps. Please let us know if you have any further questions.
I created a stored procedure
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE RemoveDuplicates
AS
BEGIN
WITH CTE AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY LastDate DESC) AS RowNum
FROM
dbo.MyTable
)
DELETE FROM CTE
WHERE RowNum > 1;
END;
GO
Hi @Yggdrasill ,
I tried to create a repro with a work around by using the CTAS and the DISTINCT keyword in the stored procedure. I have attached the screenshots for your reference.
1) Created a stored procedure removeDuplicates .
2) The data in Allotment table is as follows:
3) Executed the stored procedure.
Try using this work around in your stored procedure.
Hope this helps. Please let me know if you have any further questions.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |
User | Count |
---|---|
3 | |
3 | |
3 | |
2 | |
2 |