Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedJoin us at the 2025 Microsoft Fabric Community Conference. March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for $400 discount. Register now
I have tried out so many settings for incremental refresh on large transaction tables from BC365 (like general ledger entries, project ledger entries), etc.
Nothing works - it only makes the dataflow 2x slower or makes it completely unable to run. Two questions:
1) Any advise on changing settings? These are my current GL entry settings:
2): I have an existing gl entries table in my Fabric Warehouse. As I understand it, I am supposed to use "existing table" setting like this -->
And then pick the existing gl entries table, and check "replace":
Am I doing something wrong? I read the documentation also. I also tried using "new table" instead of "existing table", but this overwrites my gl entries table only with the changed data it found..
Thanks.
Hey @useruserhi91 !
My name is Jeroen Luitwieler PM for Incremental refresh in dataflows gen2. Super unfortunate to hear your experience is not as expected.
Something I noticed in your screenshots is that you have "requiere incremental refresh query to fully fold" turned off. Is there any reason why you have turned off this setting? Also, which connector are you using? keen to learn more of your end to end situation.
Thanks for replying, I am using the standard Business Central connector. I am turning off query folding option as it says that my query could not work if it is enabled. Any ideas on what to do? Thanks 🙂
I see, I believe that your query uses odata in de backgrounds which should support some kind of folding. Would you mind sharing a screenshots of your applied steps with the folding indicator? there may be a step that causes folding to break.
When folding breaks and you allow incremental refresh to work anyways; the dataflow will pull all data times the number of buckets (3 times in your case) and only apply filters after data is pulled. This will cause huge slowdowns as you experience right now.
Yes, here is my query for GL entries from an advanced/custom BC API (same error occurs if I'm using standard API):
I basically navigate to BC -> navigate to production -> the correct company -> advanced api -> gl entries table. I then remove some columns and change data types like so:
What is the orignal data type that is returned by the connector? there may be some columns changes and navigation steps that is breaking your folding. We do have some guidance listed here: Understanding query evaluation and query folding in Power Query - Power Query | Microsoft Learn
Folding is key to get incremental refresh to work and should give you the performance you need. Each situation is different and you may need to adjust your steps or use native queries to get what you need. @miguel would you be able to help here a bit with suggestions?
You mean what data type the initial (BC) connector has? These:
"Data" naturally contains the data i need. In the next Navigation steps this "Data" column has these data types in chronological order:
"Table"
"Any"
"Table"
"Table"
And there goes my final table.. Is the "any" data type causing the error?
If these data points are not immutable then incremental refresh is not the right tool for what you are trying to achieve.
Custom partition management is not (yet) available for dataflows.
Consider loading your data directly into a semantic model. There you have fine grain partition control.
Or consider loading your data into a data warehouse or data lake with pipelines.
Thanks for your reply. However, of course "systemModifiedAt" is NOT immutable - it changes. The whole point of an incremental refresh is to identify rows that have a dateTime value changed recently, indicating that the row has indeed changed, and must be updated. If this is not possible, what's the point of this incremental refresh feature? Can you give a concrete example of which columns i can use as input for grouping and column changes, and how these columns ought to behave? Thanks a lot!
The whole point of an incremental refresh is to identify rows that have a dateTime value changed recently, indicating that the row has indeed changed, and must be updated. If this is not possible, what's the point of this incremental refresh feature?
What you are looking for is differential refresh, something Power BI is incapable of. Incremental refresh can only go down to partition level for semantic models and bucket/file level for dataflows.
Any changes to past data will require a full or manual refresh of all affected partitions. You would have to create and maintain that process yourself. Unfortunately that is not currently possible with dataflows. @LuitwielerMSFT when can we refresh individual buckets?
In data semantics, "incremental" refresh has throughout my years always meant to change only updated or new records. This still seems to be the case when googling the term.
Also, from the MS documentation on incremental refresh in Dataflow Gen2, I see:
When you use dataflows for data ingestion and transformation, there are scenarios where you specifically need to refresh only new or updated data—especially as your data continues to grow. The incremental refresh feature addresses this need.."
And:
"Each bucket contains the data that changed since the last refresh. The dataflow knows what changed by checking the maximum value in the column that you specified. If the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination"
Am I misunderstanding something here - shouldn't it just simply look for updated data, and replace the rows that were updated, and only those rows in my table with new data?
Yes, this is a misunderstanding. The lowest you can go is to replace the bucket or partition. You cannot replace individual rows.
Well that is interesting. Thanks for clarifying, this was not clear to me from the documentation. I found the wording of incremental refresh misleading. What is the use case for this feature then?
Mostly for immutable data. If your data is mutable then you need to manage the partition updates manually.
@LuitwielerMSFT when can we refresh individual buckets in dataflows?
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the February 2025 Fabric update to learn about new features.
User | Count |
---|---|
4 | |
4 | |
2 | |
2 | |
1 |