Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
MattRGX
Regular Visitor

incremental refresh and partitionning

Hello guys,

i'm implementing incremental refresh in my dashboard and i'm facing some questions about partitionning and how it works.

in my DWH i have multiple fact table that i want to set incremental refresh on.

 

For each of this table i have a creation_date which i use to determine the range period for history.

i have also toggled on the "detect data changes" option based on another column update_date. I think i'm following the right path for the moment.

 

if i understood correctly how "detect data changes" works, after each refresh it stores the max update_date for each partition, and on the next refresh, it compares the max date of each partition to see if there is a new max date. And it refresh that partition if there is a new max date.

the question i have is about partitionning. I've open my dataset with tabular editor and saw all the partitions created automatically after the initial refresh (from daily for the firsts month to yearly for the far years).

 

My dataset in composed of data from multiple clients. And i will be adding more client's data in the future if a new client ask for it.

I don't understand how partition will be created. Are they based on the creation_date available in my DWH ?


If it is the case, the first time i insert a new client in the dataset, all his data (from all fact table) will have the same creation_date. So all the initial data from one client will be on the same partition ?

And let say a few days later a client make a single change to his initial data, so only one row is changing somewhere, the max update_date for that partition will change and i will refresh all the data from that client just because of one line change ? i don't really consider this as incremental refresh because it will load all data all the time instead of only the data that are changing.

 

Should i create custom partitions to avoid reload all client data if only one row is changing ?

 

May be i'm completely wrong in my understanding of the incremental refresh and partitionning.

Thanks a lot for your help

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

how partition will be created. Are they based on the creation_date available in my DWH ?

yes

the first time i insert a new client in the dataset, all his data (from all fact table) will have the same creation_date. So all the initial data from one client will be on the same partition ?

yes

the max update_date for that partition will change and i will refresh all the data from that client just because of one line change ?

Not just from that client. The entire partition will be refreshed

 

Incremental Refresh expects immutable data.  Your data isn't. That means you have additional maintenance effort to make sure there is no data duplication across partitions. That's just how it is.

View solution in original post

3 REPLIES 3
lbendlin
Super User
Super User

how partition will be created. Are they based on the creation_date available in my DWH ?

yes

the first time i insert a new client in the dataset, all his data (from all fact table) will have the same creation_date. So all the initial data from one client will be on the same partition ?

yes

the max update_date for that partition will change and i will refresh all the data from that client just because of one line change ?

Not just from that client. The entire partition will be refreshed

 

Incremental Refresh expects immutable data.  Your data isn't. That means you have additional maintenance effort to make sure there is no data duplication across partitions. That's just how it is.

thanks for your answers. It is really weird from a tool like power bi that we can't handle properly incremential update.

You are mixing concepts. What you need is differential refresh. Power BI only offers incremental refresh.  (Using the "detect data changes"  option makes it worse, by the way. That creates a canary, pretty much doubling the storage cost.)

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors