Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

Reply
michaelgambling
Regular Visitor

Data Ingestion Troubles

Hi All,

 

Pulling my teath out a bit trying to get my head around data ingestion. 

 

I am new to Fabric and data engineering in general. I am trying to set up a metadata driven pipeline that implements a 'watermark strategy'

 

I have many questions on this, but my first one is this, in many Youtube videos and documentation articles i read, people suggest utiliting a date/time column to use as your 'watermark'.

 

I have tried ingesting a table with a nominated column, it never works. When i try to use the 'Upsert' feature it always results in this error: 

 

"ErrorCode=FailedToUpsertDataIntoDeltaTable,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Hit an error when upsert data to table in Lakehouse. Error message: Could not load file or assembly 'System.Linq.Async, Version=6.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263' or one of its dependencies. The system cannot find the file specified.,Source=Microsoft.DataTransfer.Connectors.LakehouseTableConnector,''Type=System.IO.FileNotFoundException,Message=Could not load file or assembly 'System.Linq.Async, Version=6.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263' or one of its dependencies. The system cannot find the file specified.,Source=Microsoft.DataTransfer.Connectors.LakehouseTableConnector,'"

 

All research indicates this is a Fabric platform issue with nothing that can be done from my end. Therefore i am forced to use the 'append' table action.

 

When i run the pipeline after set up:

 

- It runs for the first time, data is ingested and looks good.

- I make some test changes to historical data, that has been ingested into the Lakehouse tables.

- I run the pipeline a second time, those changes are added as a new line to the data, with the opriginal line remaining present.

- This as a results means i have an additional accurate line, and a previous line of stale outdated and inaccurate data.

 

I cannot seem to find a way to make this work. I have speant three solid days trying to tinker with things. There is minimal documentation, often reffering to articles from Data Factorfy prior to it being rolled into Fabric. Youtube videos are never relevant, they simply talk about data scenarios where additional rows are added or people working with .CSV files. I need to find articles that are relevant to actual enterprise level scenarios as opposed to .CSV files with a few movies listed in a small handfull of rows, with data remaining completley static.

 

My scenario is:

1) I have on-premis MS SQL servers that are connected to my Fabric workspace via a datagateway.

2) I am utilising SQL SELECT queries to extract only particular columns from my SQL tables.

3) I have a data 'warehouse' that holds my watrermark and metadata control tables.

4) I have a Lakehouse where my ingested data will sit.

5) I am trying to set up a data pipeline (avoiding dataflow gen2's in order to have as least resource impact on my source server as possible)

6) I do not want to overwrite the tables i ingest everytime a schedule runs, and i seemingly am unable to utilise the upsert preview feature currently, therefore forced to use the 'append' table action.

 

I appear to need to be able to utilise some SQL MERGE logic, however as you are aware, Lakehouses are read-only for SQL. So as far as i can tell this is not really an option? I am aware that using a warehouse would be a workaround for this, however i need my data in a Lakehouse to down the track leverage M.L features.

 

I am very new to this, and i am getting insanely frustrated trying to make sense of it all, i have speant week upon week making tiny amounts of progress, this soloution is supposedly designed to make things easier for the end user, i cannot see how this does. Any help would be appreciated, as you can likely tell i am getting a little frustrated 😞

 

* Just to add, i am also aware of CDC implementation, however it would require my on-prem SQL server to become public facing and this cannot happen, so this unfortunatley not an option.

 

 

11 REPLIES 11
v-prasare
Community Support
Community Support

Hi @michaelgambling,

May I ask if you have resolved this issue? If so, Can you please share the resolution steps here. This will be helpful for other community members who have similar problems to solve it faster.


If we don’t hear back, we’ll go ahead and close this thread. For any further discussions or questions, please start a new thread in the Microsoft Fabric Community Forum we’ll be happy to assist.
Thank you for being part of the Microsoft Fabric Community.

v-prasare
Community Support
Community Support

hi @michaelgambling,

We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.

 

Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are

apturlov
Advocate I
Advocate I

@michaelgambling glad you found a workaround, though I am curious if you have considered a path of using a mirror for your on-prem SQL database and CDC in your data pipeline. I take it that your upsert strategy is aiming at incremental data load which technically should be easier to implement with CDC as it's supposed to be natively supported for SQL databases.

Yes I was aware of this set up, but unfortunately I believe that I need to expose my on prem SQL servers to the public to use this method (from my understanding), whitch is not an option 😞

I believe that exposure of on-prem database to the public network is not necessary as mirroring is supposed to be working via On-premise Data Gateway Microsoft Fabric Mirrored Databases From SQL Server

michaelgambling
Regular Visitor

Thanks all to whom have written, just to provide an update i did utilise some merge logic as @smeetsh reccomended,

 

I have used this as a workaround to the upsert issues should anyone else be facing simular problems when trying to implement there watermark strategy. I have structured my pipeline as per the below canvas image:
 
2025-08-12 16_55_59-Chat _ Saikat Mukherjee (LTIMINDTREE LIMITED) _ Microsoft Teams.png
 
The pipeline is derived mainly from this article. However i have made some changes,
 
1) I used a notebook activity to:
 
1) Create an ainitial staging.TABLE.
2) Create another destination.TABLE.
 
My copy data activity now copies initially to the 'staging' table, and then a notebook follows to compare the pre-existing contents of the destination table to that of the staging, and merge any changes, once done the staging table truncates to clear the data ready for the next run.
 
2025-08-12 17_19_09-_new 10 - Notepad++.png
 
Although this takes more time it is the only workaround i have found to conduct a working upsert operation, if anyone has any suggestions or reccomendations please do let me know?
 
I had to tweak it a bit to meet my on-prem scenario (.4 of Step 4) due to the fact I am ingesting tables and not files. Besides that the only other additional step assed was the MERGE & TRUNCATE notebook alongside pre-creating the two tables prior to running the pipeline.
 
I now have a schedule to run hourly overnight to ensure data intergrity is kept between the source and sink.
v-prasare
Community Support
Community Support

Hi  @michaelgambling,

We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.

 

 

Thanks,

Prashanth

MS Fabric community support

jaz0894
New Member

I am having the same issue, consuming data from Oracle and selected Upsert: 

ErrorCode=FailedToUpsertDataIntoDeltaTable,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Hit an error when upsert data to table in Lakehouse. Error message: Could not load file or assembly 'System.Linq.Async, Version=6.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263' or one of its dependencies. The system cannot find the file specified.,Source=Microsoft.DataTransfer.Connectors.LakehouseTableConnector,''Type=System.IO.FileNotFoundException,Message=Could not load file or assembly 'System.Linq.Async, Version=6.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263' or one of its dependencies. The system cannot find the file specified.,Source=Microsoft.DataTransfer.Connectors.LakehouseTableConnector,'

The exact same error. 

Are you going via an on-prem data gateway? If so it appears there is a known issue that is activley being worked on: https://support.fabric.microsoft.com/known-issues/2025-08-13 15_20_24-SQLQuery13.sql - lt2dilpljjmupdg5p2z5m67wrq-jki4j3ffuyre5keieugkr4ebgi.datawareh.png

Hope this helps answer some questions.

jaz0894
New Member

The exact thing is happening to me, in my case I am consuming data from Oracle Data base. 
When I run the pipeline the first time, it creates the table, but when I select Upsert and change my initial SQL to bring just the latest changes, I get: ErrorCode=FailedToUpsertDataIntoDeltaTable,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Hit an error when upsert data to table in Lakehouse. Error message: Could not load file or assembly 'System.Linq.Async, Version=6.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263' or one of its dependencies. The system cannot find the file specified.,Source=Microsoft.DataTransfer.Connectors.LakehouseTableConnector,''Type=System.IO.FileNotFoundException,Message=Could not load file or assembly 'System.Linq.Async, Version=6.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263' or one of its dependencies. The system cannot find the file specified.,Source=Microsoft.DataTransfer.Connectors.LakehouseTableConnector,'

smeetsh
Helper V
Helper V

Is it an option for you to use SQL and the gateway connector to get your data from the on-prem SQL server and load that in , what i would call, a raw table (for instance tablename:  mydata_raw) and next  use a merge in a notebook to merge your raw data with your existing data? Since you only ingest specific data from your sql server, the overwrite of your "mydata_raw" table, should not impact CU's to much.

The notebook code would look something like below ("name" is just an example column)

%%sql
MERGE INTO mydata AS target
USING mydata_raw AS source
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET target.name = source.name 
WHEN NOT MATCHED THEN
INSERT (id, name) VALUES (source.id, source.name);

 

Cheers

Hans
 

Helpful resources

Announcements
August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.