Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
Hello,
I've a webservice to import a json file.
This json contains a lot of data and I'd like to not have to import the whole json anymore.
So I added a parameter to my service to filter the data.
My request (I hardcoded the parameter for the example)
it's nice, I can import the latest data but I need my old data.
Is there a way to keep my old data in another table and append the new ones on every refresh?
for example NewData is the webservice query result. Historian is my archive.
Hello! This is my current dilemma too. How were you able to solve it?
Thanks for your support.
I set up my incremental refresh this way.
I'm using 'RangeStart' as a parameter for my web service request to not load full data (json). I can consider that they are in the archive partition of my dataset (on the service).
my table
my incremental refresh policy is 1 year of archived data, 3 days of refreshed data.
I understand that for my incremental refresh, Power BI will delete all data whose 'LASTMODIFICATION' date is less than -1 year and greater than -3 days. Append the data from my json whose 'LASTMODIFICATION' date is greater than -3 days.
But if I modified a record who are already in the archive but before the RangeStart?
My PowerBI dataset on 2022-06-23 00:00:00
ID | RECORDLINK | DATE | WORKHOUR | LASTMODIFICATION |
1 | 252654 | 20-06-2022 | 6 | 20-06-2022 07:06 |
2 | 264864 | 11-06-2022 | 7 | 11-06-2022 09:45 |
3 | 1515496 | 22-06-2022 | 2,5 | 22-06-2022 17:06 |
The record I want to refresh
ID | RECORDLINK | DATE | WORKHOUR | LASTMODIFICATION |
2 | 264864 | 11-06-2022 | 7,5 | 23-06-2022 09:45 |
What will happen? my record with ID = 2 in the archive is before the RangeStart of incremental Refresh.
PowerBI duplicate this record?
Yes. Keep in mind this is incremental refresh, not differential refresh. It is your responsibility to select a refresh range that is big enough to cover such a scenario. Three days is too short.
Learn about refreshing individual partitions.
I understand.
But even if I do a refresh over a year, I could never guarantee that my data is perfect.
I take my previous example. If I sort my data by modification date and remove duplicates (step after the custom filter on RangeStart/RangeEnd in my query). Can this work?
That might work but it will become very expensive very quickly. You would have to sort the entire data source in descending order before applying the "Remove duplicates" transform
It will be easier to run a full refresh every once in a while, even if done by partition.
unfortunately, that doesn't work.
I think only new data is considered in the power Query. I tested by ordering and then removing the duplicates, but the duplicates remain.
Power bi considers that its archive is never modified.
Either your web service supports queries for date ranges (not just individual dates) - in which case you can set up incremental refresh.
Or you need to do all the processing/storage further upstream from Power BI.
In fact, you should understand my query parameter as (>=) greater than or equal to the date.
But I can modify my webservice.
But I don't understand why the request of my webservice impacts the incremental refresh of Power BI. Do you have any documentation/reference on it?
Check out the October 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
117 | |
108 | |
108 | |
93 | |
70 |
User | Count |
---|---|
173 | |
135 | |
131 | |
96 | |
94 |