Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
BigLo
Helper II
Helper II

Json Incremental Refresh

Hello,

I've a webservice to import a json file.
This json contains a lot of data and I'd like to not have to import the whole json anymore.
So I added a parameter to my service to filter the data.
My request (I hardcoded the parameter for the example)

 

BigLo_0-1655453322591.png

it's nice, I can import the latest data but I need my old data.
Is there a way to keep my old data in another table and append the new ones on every refresh?
for example NewData is the webservice query result. Historian is my archive.

BigLo_0-1655454574483.png

 

 

9 REPLIES 9
crln-blue
Post Patron
Post Patron

Hello! This is my current dilemma too. How were you able to solve it?

 

Thanks for your support.

 

I set up my incremental refresh this way.
I'm using 'RangeStart' as a parameter for my web service request to not load full data (json). I can consider that they are in the archive partition of my dataset (on the service).

 

BigLo_1-1655969498213.png

my table

BigLo_3-1655970473557.png

 

my incremental refresh policy is 1 year of archived data, 3 days of refreshed data.

 

I understand that for my incremental refresh, Power BI will delete all data whose 'LASTMODIFICATION' date is less than -1 year and greater than -3 days. Append the data from my json whose 'LASTMODIFICATION' date is greater than -3 days.
But if I modified a record who are already in the archive but before the RangeStart?

My PowerBI dataset on 2022-06-23 00:00:00

IDRECORDLINKDATEWORKHOURLASTMODIFICATION
125265420-06-2022620-06-2022 07:06
226486411-06-2022711-06-2022 09:45
3151549622-06-20222,522-06-2022 17:06

The record I want to refresh

IDRECORDLINKDATEWORKHOURLASTMODIFICATION
226486411-06-20227,523-06-2022 09:45

 

 

What will happen? my record with ID = 2 in the archive is before the RangeStart of incremental Refresh.

PowerBI duplicate this record? 

 

 

Yes. Keep in mind this is incremental refresh, not differential refresh. It is your responsibility to select a refresh range that is big enough to cover such a scenario. Three days is too short.

Learn about refreshing individual partitions.

I understand.

 

But even if I do a refresh over a year, I could never guarantee that my data is perfect.

 

I take my previous example. If I sort my data by modification date and remove duplicates (step after the custom filter on RangeStart/RangeEnd in my query). Can this work?

That might work but it will become very expensive very quickly. You would have to sort the entire data source in descending order before applying the "Remove duplicates" transform 

 

It will be easier to run a full refresh every once in a while, even if done by partition.

unfortunately, that doesn't work.
I think only new data is considered in the power Query. I tested by ordering and then removing the duplicates, but the duplicates remain.
Power bi considers that its archive is never modified.

lbendlin
Super User
Super User

Either your web service supports queries for date ranges (not just individual dates) - in which case you can set up incremental refresh.

 

Or you need to do all the processing/storage further upstream from Power BI. 

In fact, you should understand my query parameter as (>=) greater than or equal to the date.
But I can modify my webservice.
But I don't understand why the request of my webservice impacts the incremental refresh of Power BI. Do you have any documentation/reference on it?

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.