Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The FabCon + SQLCon recap series starts April 14th at 8am Pacific. If you’re tracking where AI is going inside Fabric, this first session is a can't miss. Register now

Reply
smeetsh
Continued Contributor
Continued Contributor

insert into warehouse table has missing rows, but the source script has all.

Hi All,

 

I have something strange going on with one of my pipelines since the 28th of January. This pipeline has been running fine for a year now, and it still completes without any errors.

 

The pipeline pulls data from a snowflake connection. Next we use the sqlendpoint of the lakehouse (I have implemented a 6 minute delay, before querying the sql endpoint with a copy activity, to make sure the sql endpoint data is refreshed , ms bug #1092 if i recall correctly ).

 

The the source for the copy job is a pretty standard sql script, with joins to our dim tables, datetype conversions, some case when etc. When i run just the cript I see all my date that I expect to see. The number of rows is 4549, the number of clumns 65. So it is not a huge data set at all.

 

If i compare my data in the lakehouse (bronze) en compare the data in the warehouse (gold) a number of rows are missing, they are all the latest rows, for instance i checked this morning (for us it is the 4th) and no data from the 3rd was in the  warehouse table.

 

At first I thought this could be caused by the sqlenpoint delay, bt that doesnt make to much sense since when i 1st noticed this on the second of February, the latest data was of the 28th. Since then the pipeline had run succesfully about 8 times (it runs 2x a day)

 

The fact that my script display all rows, but some how the copy job does not ingest all rows puzzles me, it is a copy job afterall, with no user input besides source script and columnmapping.

 

The pipeline does have a number of copy jobs in paralel however, could that cause be the issue, is there a setting I can tweak? I am stumped, if the source query has all the rows, the mapping hasn't changed and the pipeline runs without error, why would te copy job not insert all rows?

Belows a screenshort of the pipeline, so you can see how it is laid out,  it does have 8 paralel steps, the one i noticed it is step: 2000_copy etc etc

smeetsh_0-1770165141830.png

 

Cheers

Hans

 

 

 

Cheers

Hans

(If my solution was usefull, please give it a kudo, and mark it as a solution)
2 ACCEPTED SOLUTIONS
stoic-harsh
Resolver III
Resolver III

Hi @smeetsh,

I have sometimes observed silent records loss in copy jobs due to SQL Endpoint snapshot lag. And I want to rule out schema/data-type changes or copy logic issue, since the same setup has been running successfully for a year.

I suggest either read directly from Lakehouse (keeping a delay of 6 minutes) or increase the wait-time if you like to use SQL Endpoint only.

View solution in original post

bariscihan
Resolver II
Resolver II

Hi all,

After digging deeper, this doesn’t look like a Copy activity or mapping issue. The key point is that rowsRead = rowsWritten in the Copy activity output, and the activity completes successfully without any warnings or errors. This rules out sink-side rejections, data type issues, or constraint violations in the Warehouse.

What we are actually seeing is a data freshness / synchronization issue caused by the SQL Endpoint.

Although the source query returns all expected rows when executed manually, the Copy activity reads from the SQL Endpoint snapshot, which can lag behind the Lakehouse data. This explains why:

  • The latest rows are missing in the Warehouse

  • The issue is intermittent

  • The pipeline succeeds with correct row counts from the snapshot it sees

Increasing the wait time helps, but it is not deterministic, especially when multiple pipelines and parallel copy activities are running.

Confirmed solution / recommendation:

  • Avoid using the SQL Endpoint as a source for time-sensitive ingestion.

  • Read directly from the Lakehouse (Files/Delta tables) instead.

  • If SQL Endpoint must be used, apply a much larger delay and treat it as eventually consistent, not real-time.

Once we switched to reading directly from the Lakehouse, the missing rows issue disappeared completely.

This confirms the root cause is SQL Endpoint snapshot lag, not the Copy activity itself.

Hope this helps anyone running into similar “silent missing rows” behavior.

Cheers
Barış

View solution in original post

8 REPLIES 8
smeetsh
Continued Contributor
Continued Contributor

Thanks for the help all, I suspected it was the SQL endpoint delay and I am glad i wasn't going mad :D. I have now increased it to 15 minutes, the data is not that time sensitive. Since our gold medallion is a warehouse, and as far as i know i cannot write from a lakehouse to a warehouse, changing the architecture is not an option right now. I may have to start using the sempy labs sql endpoint refresh API if this gets any worse

Cheers

Hans

(If my solution was usefull, please give it a kudo, and mark it as a solution)
bariscihan
Resolver II
Resolver II

Hi all,

After digging deeper, this doesn’t look like a Copy activity or mapping issue. The key point is that rowsRead = rowsWritten in the Copy activity output, and the activity completes successfully without any warnings or errors. This rules out sink-side rejections, data type issues, or constraint violations in the Warehouse.

What we are actually seeing is a data freshness / synchronization issue caused by the SQL Endpoint.

Although the source query returns all expected rows when executed manually, the Copy activity reads from the SQL Endpoint snapshot, which can lag behind the Lakehouse data. This explains why:

  • The latest rows are missing in the Warehouse

  • The issue is intermittent

  • The pipeline succeeds with correct row counts from the snapshot it sees

Increasing the wait time helps, but it is not deterministic, especially when multiple pipelines and parallel copy activities are running.

Confirmed solution / recommendation:

  • Avoid using the SQL Endpoint as a source for time-sensitive ingestion.

  • Read directly from the Lakehouse (Files/Delta tables) instead.

  • If SQL Endpoint must be used, apply a much larger delay and treat it as eventually consistent, not real-time.

Once we switched to reading directly from the Lakehouse, the missing rows issue disappeared completely.

This confirms the root cause is SQL Endpoint snapshot lag, not the Copy activity itself.

Hope this helps anyone running into similar “silent missing rows” behavior.

Cheers
Barış

stoic-harsh
Resolver III
Resolver III

Hi @smeetsh,

I have sometimes observed silent records loss in copy jobs due to SQL Endpoint snapshot lag. And I want to rule out schema/data-type changes or copy logic issue, since the same setup has been running successfully for a year.

I suggest either read directly from Lakehouse (keeping a delay of 6 minutes) or increase the wait-time if you like to use SQL Endpoint only.

smeetsh
Continued Contributor
Continued Contributor

It certainly feels like the SQL endpoint delay issue, I just don't understand why it now has suddenly become an issue since the 28th, the delay is already at 400 seconds, had been for ages, I will increase it even more (600 seconds). We have dozens of pipelines built this way but only this one seems affected

 

Cheers

Hans

(If my solution was usefull, please give it a kudo, and mark it as a solution)
v-dineshya
Community Support
Community Support

Hi @smeetsh ,

Thank you for reaching out to the Microsoft Community Forum.

 

Please try below things to fix the issue.

 

1. Check one failed run activity Output JSON: Open the run --> the specific Copy activity like "2000_copy_incident" --> Output.

Capture: rowsRead, rowsWritten, Any error or skip counts, Reject log path, if present. If rowsRead = 4549 and rowsWritten < 4549, you have sink rejections.

2. In the Copy activity --> Fault tolerance: Set Maximum errors = 0 and Disable Skip incompatible rows.

3. Re-run. If the activity now fails, you will get the exact cause like PK violation, conversion error, string truncation, date overflow, etc.

4. Check data types & lengths for every column: VARCHAR/NVARCHAR lengths, DECIMAL(p,s) ranges and DATETIME/DATE conversions like TIMESTAMP_TZ from Snowflake --> SQL

Note: Newest rows often introduce longer values or unexpected nullability.

5. Check Warehouse constraints & indexes, Does the target table have PRIMARY KEY or UNIQUE constraints and Computed columns with constraints. Try inserting one of the “missing” rows manually into Warehouse using the same values.

6. Check there’s no hidden filter, parameter substitution, pipeline variables, or expressions like @{formatDateTime(utcNow(), 'yyyy-MM-dd')}). Run the exact command text from the activity using the same linked service and count the rows. If counts match the manual run (4549) but Copy lands fewer, it’s a sink issue.
If counts differ, it’s a source query/session issue.

 

I hope this information helps. Please do let us know if you have any further queries.

 

Regards,

Dinesh

 

 

 

smeetsh
Continued Contributor
Continued Contributor

The activity doesn't fail, it runs without any errors, if i manualy run the one step it seems to work as well.

 

Edit i had a look at the specific copy step and the no of rows read = no of rows copied, so we are experiencing a synch issue I reckon?

smeetsh_0-1770235053294.png

 

 

Cheers

Hans

(If my solution was usefull, please give it a kudo, and mark it as a solution)

Hi @smeetsh ,

As mentioned by @stoic-harsh , could you please  read directly from Lakehouse instead of SQL Endpoint . 

 

Regards,

Dinesh

smeetsh
Continued Contributor
Continued Contributor

Reading directly from a lakehouse is sadly  not an option at this time

Cheers

Hans

(If my solution was usefull, please give it a kudo, and mark it as a solution)

Helpful resources

Announcements
FabCon and SQLCon Highlights Carousel

FabCon &SQLCon Highlights

Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Fabric Update Carousel

Fabric Monthly Update - March 2026

Check out the March 2026 Fabric update to learn about new features.

Top Kudoed Authors