Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hi,
I'm having an issue getting incremental refresh to operate as I'd expect, and am hoping you you might be able to help me out.
I am building a set of reports on data from our Microsoft Dynamics PSA CRM tool. The desire is to have these reports show data as close to live as possible, meaning a scheduled refresh every half hour. To achieve that I need to have the refresh duration comfortably under 30 minutes.
My architecture / data pipeline looks like this: Dynamics -> SQL DB Export -> Dataflow (Extract) -> Dataflow (Transform 1) -> Dataflow (Transform 2) -> Dataset -> Report
In order to achieve the short (and efficient) refresh I have incremental refresh set up on the Dataflow (Extract) entities. The tables coming from Dynamics all have two datetime fields, createdon and modifiedon, which reflect when the record was created, and last modified. I have tried a combination of incremental refresh settings with those date fields and have not had the behaviour I would expect.
Desired incremental refresh behaviour:
Tested settings and results:
[A] No incremental refresh:
[B] Filter on createdon, store for 100 years, refresh from past 1 days, detect data changes on modifiedon:
[C] Filter on modifiedon, store for 100 years, refresh from past 1 days, detect data changes on modifiedon:
[D] Filter on createdon, store for 100 years, refresh from past 1 days, detect data changes off:
[E] Filter on modifiedon, store for 100 years, refresh from past 1 days, detect data changes off:
This behaviour doesn't seem to be correct, according to my understanding of the intention of incremental refresh. Please could you help enlighten me to my mistake(s) and misunderstanding, and please give guidance how the right datetime fields to use.
Thanks for your help
Hi - An update on the solution to my issue....
My mistake was in the columns I used to configure Incremental Refresh. This was due to my misunderstanding of how incremental refresh works - which I think could be improved across Microsoft documentation and others' tutorials/guides.
The correct combination is:
This is because incremental refresh creates partitions (groups) of data using the value in the Filter (1) column (i.e. when that record was created, grouped into month long partitions). It is this Filter (1) column which is used as the 'key' for incremental refresh (not the actual table primary key). The maximum value of the detect changes column (2) is calculated for each group; if that changes the entire group is discarded and reloaded (by running the query on FIlter (1) column). What it does not do, is track specific records using their primary key.
My understanding is: when the data is refreshed:
This means that the value in Filter (1) column should not change for a given record, through its lifetime. If it does, then that record will fall under more than one partitition query through its lifetime. If/when partitions are refreshed, that record could be duplicated in your loaded data.
Detect changes column (2) must always change whenever any field of the record changes through its lifetime. If it does not, then that record (the partition that record falls in) will not be flagged as changed, and won't be refreshed. It makes sense that you could use a Detect changes column (2) which is only updated when (for example) columns A, B or C are changed if those are the only columns you load into your data model.
Hope that helps!
Hi John,
Did you find a solution to operate incremental refresh without duplicates?
Thank you!
Hi,
I have the same problem using an update date for incremental refresh, that generates duplicate values.
The problem is that i only refresh the last 10 days, but the last refresh from a row could be older in the past (more than one year).
Any advice ? Why during the second refresh, the initial row is not updated ?
Hi,
Any suggestions as to where I'm going wrong, or is this expected behaviour?
Thanks for any help you can give
Hi,
Yes, I would like only data that has changed to be refreshed, while reflecting the contents of the original database exactly (no duplicates, and reflecting all changes).
I have configured incremental refresh as per the linked article. It doesn't specify which datetime field to use, or how the behaviour would change by using different datetime fields.
The example given in that article uses refresh on the createdon datettime field. That is [D] from my tests (original post) and results in the dataflow not reflecting changes made to the source database.
I have also tried with detect data changes on, via the modifiedon datetime field, which is [B] from my tests. That gives the same (incorrect) results.
Is this expected behaviour? And/or what settings should I be using?
Thanks for your help
Hi @john_ach ,
Only data that's changed needs to be refreshed under incremental refresh.
You can follow this document to set incremental refesh for dataflows:
Configuring incremental refresh for dataflows
Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
14 | |
13 | |
12 | |
12 | |
12 |