The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi,
I'm trying to set up incremental refresh for a Gen1 Dataflow in PBI Service that queries an Oracle database.
The SQL query is rather complex with several joins and nested CTE that would result in a long load time if the table was refreshed entirely (~300K rows by 40 columns, about 1.5 hours refresh time).
The table does have separate CREATED_DATE and MODIFIED_DATE columns that I am trying to use with the Incremental Refresh Detect Data Changes functionality, with CREATED_DATE as the filter column and MODIFIED_DATE as the detect changes column.
The issue is that the Canary function that is automatically set up with the Incremental Refresh Detect Data Changes setting uses the full SQL query to detect max values, meaning that there is minimal benefit to using this setting.
I am trying to achieve a faster time to detect changes by using a minimalistic SQL query that only extracts the CREATED_DATE and MODIFIED_DATE columns from the source database, circumventing all the calculations and joins in the full SQL query. However, upon saving my changes, PBI automatically reverts the changes made to the Canary function.
Is there a way to get this method to stick? Or is there another method that could provide similar efficiencies?
An alternative that I am considering is to create a linked transformation dataflow that groups and filters out duplicate primary keys based on the latest MODIFIED_DATE, but that is not the best design.
Any help is greatly appreciated.
Best regards,
Brandon
Solved! Go to Solution.
Hi @brandonyeo Power BI Gen1 Dataflows don’t let you modify the Canary function used for detecting changes. It always uses the full query, so performance gains are limited in complex cases. To optimize, create a lightweight SQL view or staging dataflow with just primary key and MODIFIED\_DATE. You can also consider using Gen2 Dataflows for better incremental refresh control.
Hi @brandonyeo,
Thank you for posting your query in the Microsoft Fabric Community Forum, and thanks to @lbendlin & @Akash_Varuna for sharing valuable insights.
Could you please confirm if your query has been resolved by the provided solutions? This would be helpful for other members who may encounter similar issues.
Thank you for being part of the Microsoft Fabric Community.
You may be better off using manual refreshes of the older partitions via the XMLA endpoint ( ie by recreating the canary function in a more economical way).
Hi @lbendlin, thanks for the tip. Correct me if I'm wrong, but I don't think XMLA endpoint can be set up for dataflows? Most of the info I can find about this topic is on semantic models. The table in this dataflow gets used as a standard fact table across several reports, so I would prefer for it to be set up in a dataflow.
You are right, at the moment the XMLA based control only works for semantic models. If this is important to you please consider voting for an existing idea or raising a new one at https://ideas.fabric.microsoft.com
Hi @brandonyeo Power BI Gen1 Dataflows don’t let you modify the Canary function used for detecting changes. It always uses the full query, so performance gains are limited in complex cases. To optimize, create a lightweight SQL view or staging dataflow with just primary key and MODIFIED\_DATE. You can also consider using Gen2 Dataflows for better incremental refresh control.
You can also consider using Gen2 Dataflows for better incremental refresh control.
Do you have a reference for this?
Hi @Akash_Varuna, thanks for your reply. Yes, I ended up doing something like this - Add another lightweight view that gets the P_KEY and MODIFIED_DATE without the incremental refresh filter, then merging it with the original query as an inner join. Thankfully all historical changes in the db are stored in a separate audit table so the main table has unique P_KEY values only.
Unfortunately, we have not migrated entirely onto Fabric, so Gen2 Dataflows is not available for us.
User | Count |
---|---|
36 | |
14 | |
12 | |
10 | |
8 |
User | Count |
---|---|
44 | |
43 | |
19 | |
19 | |
18 |