Solved: Lookup Activity vs Dataflow Gen 2

tan_thiamhuat · ‎04-29-2025

I have source data SFTP_Data in SQL Server which is updated regularly, and its data is huge, about 200,000 rows for each update. From its source data, I need to produce its Dimension and Fact tables which is stored in SQL Server tables too. Dimension and Fact tables requires transformations based on the source data.

So for each update of the source data, its dimension and fact tables need to be updated too, and appended to them. I am thinking of using Fabric to perform the transformations, but I do not want to pull all the huge data inside Fabric to do the transformations. Can you suggest ways or methods to do this? example, using Lookup activity to connect to the tables.

But with Lookup activity, how are we able to do the transformation from it? Can the Lookup activity which is retrieving the source data be connected to the Dataflow to do the transformation? I seem to have difficulty in connecting both together. With the Lookup activity, does it pull the whole source data inside Fabric?

v-hashadapu · ‎04-29-2025

Hi @tan_thiamhuat , Thank you for reaching out to the Microsoft Community Forum.

Create a Watermark table in SQL Server to track the last processed timestamp. Each pipeline run should extract only rows newer than this value, enabling incremental processing and reducing unnecessary data movement. Use a Lookup activity in Microsoft Fabric to retrieve the current watermark value. Use a Copy activity to query new rows from the source based on the watermark and write them to a Fabric Lakehouse (Delta table) or a staging SQL Server table. This activity streams data efficiently and does not load it into pipeline memory, making it suitable for large datasets.

Once the incremental data is staged, use Dataflows Gen2 for low-code, visual transformations such as deduplication, joining with Dimension tables, assigning surrogate keys and aggregating measures. For more complex or custom transformation logic, use a Spark Notebook, which offers full control through code. Write the transformed outputs directly to the Dimension and Fact tables in SQL Server. After loading, use a Script activity to update the watermark with the latest processed timestamp. This ensures that only new data is picked up in future runs, keeping the process efficient and reliable.

Alternatively, if the transformation logic can be handled in T-SQL, use Fabric to orchestrate a stored procedure that performs the transformations and loads directly within SQL Server. This approach reduces Fabric compute usage and keeps processing close to the data.

If this helped solve the issue, please consider marking it 'Accept as Solution' so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.

View solution in original post

v-hashadapu · ‎05-08-2025

Hi @tan_thiamhuat , Please let us know if your issue is solved. If it is, consider marking the answer that helped 'Accept as Solution', so others with similar queries can find it easily. If not, please share the details.
Thank you.

v-hashadapu · ‎05-05-2025

Hi @tan_thiamhuat , Please let us know if your issue is solved. If it is, consider marking the answer that helped 'Accept as Solution', so others with similar queries can find it easily. If not, please share the details.
Thank you.

v-hashadapu · ‎04-29-2025

Hi @tan_thiamhuat , Thank you for reaching out to the Microsoft Community Forum.

Create a Watermark table in SQL Server to track the last processed timestamp. Each pipeline run should extract only rows newer than this value, enabling incremental processing and reducing unnecessary data movement. Use a Lookup activity in Microsoft Fabric to retrieve the current watermark value. Use a Copy activity to query new rows from the source based on the watermark and write them to a Fabric Lakehouse (Delta table) or a staging SQL Server table. This activity streams data efficiently and does not load it into pipeline memory, making it suitable for large datasets.

Once the incremental data is staged, use Dataflows Gen2 for low-code, visual transformations such as deduplication, joining with Dimension tables, assigning surrogate keys and aggregating measures. For more complex or custom transformation logic, use a Spark Notebook, which offers full control through code. Write the transformed outputs directly to the Dimension and Fact tables in SQL Server. After loading, use a Script activity to update the watermark with the latest processed timestamp. This ensures that only new data is picked up in future runs, keeping the process efficient and reliable.

Alternatively, if the transformation logic can be handled in T-SQL, use Fabric to orchestrate a stored procedure that performs the transformations and loads directly within SQL Server. This approach reduces Fabric compute usage and keeps processing close to the data.

If this helped solve the issue, please consider marking it 'Accept as Solution' so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.

Lookup Activity vs Dataflow Gen 2

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Party with Power BI’s own Guy in a Cube

Lookup Activity vs Dataflow Gen 2

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025