Is incremental amass data advisable with SQL analy...

smpa01 · ‎08-17-2024

Incremental amass data with df gen2 requires the max watermark (e.g. id) of the target to be queried first and then pass on the source table query filtering

// high level these are the steps
A. get max id of target lakehouse table
B. generate rows from source by querying data where src id> target maxId (from A)
C. Append only rows from B to target lakehouse table

While the doc recommends to query the target table using Lakehouse connector and get the max data using PQ, I want to replace this by querying the target lakehouse table using SQL end-point which is far more efficient in getting a scalar value (select max(id) from StagingLakehouse.dbo.orders) rather than using PQ to get the whole data and then a method to get a scalar value.

My only dilema in doing this is SQL-Endpoint-Slow-To-Reflect-Changes-In-Lakehouse / Delayed-data-refresh-in-SQL-Analytical-Endpoint

What if I do incremental amass frequently in a very short span, I am scared the delay from SQL endpoint will mess up the final table as the most recent id will not be returned by SQL end point.

Is some1 currently performing incremental amass with Lakehouse SQL end point?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

frithjof_v · ‎08-17-2024

The Lakehouse.contents() connector already uses the SQL Analytics Endpoint if you're querying a Table of the Lakehouse.

https://www.reddit.com/r/MicrosoftFabric/s/gbnlQv5foU

https://www.reddit.com/r/MicrosoftFabric/s/fB3kcQFSWr

So you will run the same risk.

smpa01 · ‎08-18-2024

Great insight.

In that case, is MS working towards fixing the lag? Is there a timeline? This is extemely annoying.

What is safest connector for this purpose ATM (even if inefficient).

I am paying for all my table reads. Hence, I expect MS to cure this lag so that I can write fully qualified efficient SQL queries to bring in scalar values.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

frithjof_v · ‎08-18-2024

I don't know. I'm hoping to get guidance on how to handle the potential lag issue.

I think the ADLS connector (AzureStorage.DataLake() function) in combination with the DeltaLake.Table() function is an option if we want to query the Delta Tables in the 'Lake' part of the Lakehouse from Power BI Desktop.

https://blog.fabric.microsoft.com/en-us/blog/read-data-from-delta-lake-tables-with-the-deltalake-tab...

I'm not sure if Power BI is able to do query folding with this connector. So it could perhaps be more heavy for the Power Query engine.

And I'm not sure if we would get charged for ADLS read/write transactions by using this method. However the latter would surprise me as the data is logically stored in OneLake so I think it should consume Fabric CUs and not ADLS transactions. But I'm just guessing here.

smpa01 · ‎08-18-2024

@frithjof_v if there was no lag, the best solution would be to use the sql endpoint to get a scalar value for incremental amass. It is efficient and has better query plan.

Even if you use Lakouse connectors it first brings the whole table and then you get a scalar value, which to me is inefficient too. Isn't that making unnecessary readings ?

Looks like AzureStorage.DataLake combined with DeltaLake.Table could be safest option ATM even though it might require more CU for reading (I am guessing).

I hope someone from MS comment in this thread in respect to

A. Lag fixing timeline for SQL endpoint of Lakehouse.

B. Till it happens, what is the safest connector to use

C. What is the reading impact of B on CU unit

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

Is incremental amass data advisable with SQL analytics Endpoint?

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Is incremental amass data advisable with SQL analytics Endpoint?

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026