Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Incremental amass data with df gen2 requires the max watermark (e.g. id) of the target to be queried first and then pass on the source table query filtering
// high level these are the steps
A. get max id of target lakehouse table
B. generate rows from source by querying data where src id> target maxId (from A)
C. Append only rows from B to target lakehouse table
While the doc recommends to query the target table using Lakehouse connector and get the max data using PQ, I want to replace this by querying the target lakehouse table using SQL end-point which is far more efficient in getting a scalar value (select max(id) from StagingLakehouse.dbo.orders) rather than using PQ to get the whole data and then a method to get a scalar value.
My only dilema in doing this is SQL-Endpoint-Slow-To-Reflect-Changes-In-Lakehouse / Delayed-data-refresh-in-SQL-Analytical-Endpoint
What if I do incremental amass frequently in a very short span, I am scared the delay from SQL endpoint will mess up the final table as the most recent id will not be returned by SQL end point.
Is some1 currently performing incremental amass with Lakehouse SQL end point?
The Lakehouse.contents() connector already uses the SQL Analytics Endpoint if you're querying a Table of the Lakehouse.
https://www.reddit.com/r/MicrosoftFabric/s/gbnlQv5foU
https://www.reddit.com/r/MicrosoftFabric/s/fB3kcQFSWr
So you will run the same risk.
Great insight.
In that case, is MS working towards fixing the lag? Is there a timeline? This is extemely annoying.
What is safest connector for this purpose ATM (even if inefficient).
I am paying for all my table reads. Hence, I expect MS to cure this lag so that I can write fully qualified efficient SQL queries to bring in scalar values.
I don't know. I'm hoping to get guidance on how to handle the potential lag issue.
I think the ADLS connector (AzureStorage.DataLake() function) in combination with the DeltaLake.Table() function is an option if we want to query the Delta Tables in the 'Lake' part of the Lakehouse from Power BI Desktop.
I'm not sure if Power BI is able to do query folding with this connector. So it could perhaps be more heavy for the Power Query engine.
And I'm not sure if we would get charged for ADLS read/write transactions by using this method. However the latter would surprise me as the data is logically stored in OneLake so I think it should consume Fabric CUs and not ADLS transactions. But I'm just guessing here.
@frithjof_v if there was no lag, the best solution would be to use the sql endpoint to get a scalar value for incremental amass. It is efficient and has better query plan.
Even if you use Lakouse connectors it first brings the whole table and then you get a scalar value, which to me is inefficient too. Isn't that making unnecessary readings ?
Looks like AzureStorage.DataLake combined with DeltaLake.Table could be safest option ATM even though it might require more CU for reading (I am guessing).
I hope someone from MS comment in this thread in respect to
A. Lag fixing timeline for SQL endpoint of Lakehouse.
B. Till it happens, what is the safest connector to use
C. What is the reading impact of B on CU unit