Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
kahn_ohara
Frequent Visitor

Daily Incremental Refresh

I am working with large snowflake data and hence would like to use incremental refresh. The data I'm working with has several columns with the key ones being create_timestamp, update_timestamp and unique_identifier (this should be the Primary key of the table after the data is refreshed). Overall, I'd like to provide 3 years worth of data from the refresh date based on create_Ts. Old data (based on create_ts) gets updated but only rarely. Otherwise only newer data gets updated. I'd like to refresh the data daily.

 

In an ideal world, I'd want to to perform these 4 steps

 

1. query the results with "where date(update_ts)=refresh date" to minimize database load/runtime. Load this data to Power BI.

2. have power bi append the results in step 1 with the existing historical data/table in power bi. Call this TABLE_A.

3. In power bi, for each unique_identifier in TABLE_A., keep only the record with the latest update_Ts and drop all others.

4. In power bi, apply filter based on create_Ts to last 3 years to TABLE_A. 

 

I reviewed Power BI's incremental refresh documentation and considered using an incremental refresh policy with "Set incremental period to 1 day prior to refresh" and archival period of 3 years based on update_ts. Would this accomplish steps 1 and 2 described above? Based on microsoft documentation it seems like this policy isn't recommended, though, and that we're supposed to use create_ts for the incremental refresh/archival period and not update_Ts. Can anyone expalin why? 

 

Everything created in the last 3 years is a subset of everything updated in the last 3 years so there's no issue there. When a record is first created create_Ts=update_Ts.

 

2 ACCEPTED SOLUTIONS
lbendlin
Super User
Super User

considered using an incremental refresh policy with "Set incremental period to 1 day prior to refresh" and archival period of 3 years based on update_ts. Would this accomplish steps 1 and 2 described above?

yes.  

 

If you use update_ts instead of create_ts  you will end up with duplicates of your rows across multiple partitions, and you will have to include occasional partition refreshes in your process (or consider using the "track changes" feature - at an enormous storage cost - to handle changes for you).

 

View solution in original post

3 REPLIES 3
kahn_ohara
Frequent Visitor

@lbendlin can you explain how the track changes feature works on the backend? Couldn't find any documentation.

lbendlin
Super User
Super User

considered using an incremental refresh policy with "Set incremental period to 1 day prior to refresh" and archival period of 3 years based on update_ts. Would this accomplish steps 1 and 2 described above?

yes.  

 

If you use update_ts instead of create_ts  you will end up with duplicates of your rows across multiple partitions, and you will have to include occasional partition refreshes in your process (or consider using the "track changes" feature - at an enormous storage cost - to handle changes for you).

 

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors