Solved: Cannot implement delta updates from upstream Postg...

silly_bird · ‎10-21-2024

Hi all

I got stuck with a simple task

I need to read from PostgresSQL that sits on prem behind gateways.

The gateway works for workflow and copy activity, but is not available in the notebook.

Postgres has some huge tables, I need to read only delta.

I need to lookup max recorded timestamp of the target table, read records after the timestamp from the source table, merge.

I prefer to merge data into Lakehouse table, that will be my bronse layer.

Sadly I can't find how to achieve this.

I tried pipeline.

First, I need lookup the latest timestamp.

I tried script. It does not apply to the datalake source (regarless it has an SQL endpoint !?!)

so I employed a "lookup" activity.

I tried to play with "add dynamic context" and apply function "max"

This is from the documentation:

Is this use case is for people who can't figure which one of harcoded values are bigger and need to employ a function over them?

I can't figure how to apply this to a column.

If anyone knows how to do that please help!

Also, I'm very much concerned about performance. Is it going to load the terrabyte table to memory to calculate max of the column?

Next, Spark.

Spark can't read from gatewayed sources.

Finally, I tred a dataflow gen2.

First query selects max state.

Send use it as a filter.

So far no luck

Source = Sql.Database("<>.datawarehouse.fabric.microsoft.com", "my_staging_lakehouse", [Query = "SELECT max(lastmodifiedutc) as lastmodifiedutc FROM state"]),

#"Converted to list" = Source[lastmodifiedutc],

#"lastmodifiedutc" = List.Max(#"lastmodifiedutc")

in

#"lastmodifiedutc"

--2nd step--

let

Source = PostgreSQL.Database("<>", "<>"),

#"Navigation 1" = Source{[Schema = "data", Item = "state"]}[Data],

#"Filtered rows" = Table.SelectRows(#"Navigation 1", each [lastmodified] > #"state"{0}[lastmodifiedutc])

in

#"Filtered rows"

Please help figuring how to use result of the previous query in a filter of the next one please!

FabianSchut · ‎10-21-2024

Hi @silly_bird,

To get the MAX value of a column, you could use the lookup activity in combination with a sql server connection. As I can see in your screenshots, you have a lakehouse connection defined. Instead of defining a lakehouse connection, you should set up a sql server connection and use the sql endpoint from the lakehouse in the settings. If you have a sql server connection, you are able to query the table in the lookup activity. In that query, you can directly take the MAX value within your select statement. In the next steps of the data pipeline, you can use the queried max value.

View solution in original post

FabianSchut · ‎10-21-2024

Hi @silly_bird,

To get the MAX value of a column, you could use the lookup activity in combination with a sql server connection. As I can see in your screenshots, you have a lakehouse connection defined. Instead of defining a lakehouse connection, you should set up a sql server connection and use the sql endpoint from the lakehouse in the settings. If you have a sql server connection, you are able to query the table in the lookup activity. In that query, you can directly take the MAX value within your select statement. In the next steps of the data pipeline, you can use the queried max value.

silly_bird · ‎10-22-2024

Yes, I also found this thought manually defining datalake as Azure Sql Server

Then my activities pipeline looks like that

Parametrisation of the query in the copy was very difficult to figure out, but throught searching browerstack etc. I came up with the following formula:

@concat('SELECT * FROM data.stockmaster', 
if(or(equals(activity('stockmaster timestamp').status, 'Failed'), 
equals(activity('stockmaster timestamp').output.firstRow.lastmodified, null)),
'',
concat(' WHERE lastmodified > ', '''', string(activity('stockmaster timestamp').output.firstRow.lastmodified), '''')))

Cannot implement delta updates from upstream Postgres server behind gateway

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

Cannot implement delta updates from upstream Postgres server behind gateway

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025