Loving dataflows, but would love the ability to do an incremental refresh in more of an 'upsert' style. That is, maintain the uniqueness of a primary key within the dataset. Any ideas on how to do this? I assume it will look something like excuting some sort of M-style merge after the dataflow incremental refresh pulls in more data.
For refrence my table size is ~1 million rows.
I SO GLAD someone asked this question too!
Currently incremental refresh works for the data was updated based on datetime stamp, say last 10 days. It does not take into consideration the Primary Key that might have been inserted in the past.
Eg: lets say a "Sales Order" was created 3 months with a Status "Open", but only today its status was changed to "Shipped". Since I am only looking for past 10 days data for incremental refresh, a new Sales Order row gets inserted into the Dataset causing my "Sales Order" table to be duplicated.
How can we avoid this? Ideally along with the Datetime stamp, there needs to be a option where we specify the Unique Key column too. If incremental refresh contains any of the Unique Keys loaded in the past, they need to be deleted too and reinserted.
Any update regarding consideration of primary key when incrementally refreshing? I have a similar situation and I am not sure Power BI dataflows can solve my issue.
Is there any timeline for if/when it will be added as a feature for dataflows? This can be seen as a critical feature for refreshing data dependent on past state changes (i.e. Orders, Status, Deliveries, Inventory, Requests, etc)
The underlying technology for incremental refresh seems to be based on Partitions from SSAS. Therefore my guess is I don't think this will fundamentally change to support a primary key upsert any time soon.
Assuming therefore that when a record is updated it will appear in your incremental refresh therefore creating a duplicate you should add extra steps in your MCode to remove duplicates keeping the Max Primary Key therefore pruning the data in your Dataflows ETL process.
I assume the meaning of the question is similar to my own..
I can use a Date field in my data to sync only the last day for example. But my 'Upsert' question is will it be able to replace existing data if the primary key shows it already exists?
Lets say I have 10 records, one record created each day. If a user of my system updates record 2 that was originally created 2 says ago but updated today, will the incremental refresh 'Upsert' it? Or will this break the refresh?
When I read about partitions and how incremental refresh seemed to work in original Analysis Services it seems to be purely additive for new data incrementally.
That would work as either by making the incremental refresh look at the DateTime column of the updated data.
Or you could look at the following below when setting it up
What you can do is to also enable the option for Detect Changes, where you can update the column based on a date, which will be included in the incremental process
Hi @GilbertQ Thanks. But this does not take care of my scenario. If only depends on Max date in "Changes detected". If my date column is not falling in the rangeof teh incremental refresh cycle , it will still miss the older rows.
Let me know if I missed something
Yes it does.. but you will need another date column thats different from the Incremental refresh date.
Do you have any example or scenario that can help me understand how to use this option?
Check out the November 2023 Power BI update to learn about new features.
Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.
130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.