Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers!
Enter the sweepstakes now!Prepping for a Fabric certification exam? Join us for a live prep session with exam experts to learn how to pass the exam. Register now.
Hi,
I'm trying the new incremental refresh feature for a dataflow gen2.
I'd like to understand better about how using the bucket size setting.
As a first thing, is it related to the "DateTime column to filter by" or to the "Only extract new data when the maximum value in this column changes"?
Choosing a bucket size equals to a month, which is the resulting bucket range and respect to which date?
Thanks
Solved! Go to Solution.
Hi @pmscorca ,
Thanks for sharing your incremental refresh settings. Let's break down your points to clarify the concepts further.
Best Regards,
Adamk Kong
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Jcxhbb the other one is dgghfgjj the same time as well as
Hi @pmscorca ,
Depending on your settings:
Setting the bucket size to months means that the data will be partitioned by month based on the orderdate column. Since you are extracting data from the last 2 months, the dataflow will create buckets for each month within those 2 months.
For example, if today is October 2, 2024, the dataflow will create buckets for the following months:
These buckets will be used for incremental refreshes. The dataflow will check the modifieddate column to determine if the data within these buckets has changed. If the maximum value in the modifieddate column changes, the dataflow will refresh the entire bucket for that month.
Best Regards,
Adamk Kong
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi, thanks for your reply. Below I've reported my incremental refresh settings.
The subject isn't very clear for me, yet.
A monthly bucket means that the data will be partitioned by month based on the orderdate column, namely the "date column to filter by". So, considering the two rows to import, I expected that the monthly bucket partitions were 3, August, September and October starting from the first order date (08/10/2024) and comprising the actual calendar month or only 1 partition because the orders are only at August; but it occurs to consider also what specified for the "Extract data from the past" options (e.g. choosing months should be coherent with the bucket size).
Extract data from the past" options should be valid only for the first load and not for the next ones: for October, November, December and so on it should be loaded the orders of October, then the orders of November and next the orders of December, evaluating the order date respect to the range of a monthly bucket partition (e.g. the order with the order date 10/20/2024 falls into the October bucket partition), isn't it?
Moreover, it is very interesting this concept "If the maximum value in the modifieddate column changes, the dataflow will refresh the entire bucket for that month": so an unmodified order in the destination could be deleted and replaced if at least another order is changed in the same monthly bucket partition, namely the order dates of the two orders fall into this partition.
Ultimately:
- the "date column to filter by" allows to establish which data to consider for each load respect to a date,
- the "bucket size" allows to partition the data for each load respect to the previous date following a specified size. Each load consists of getting a next bucket partition from the source, not present in the destination,
- the "only extract new data when the maximum value in this column changes" allows to check the modified date for the data in bucket partition of the source with the corresponding modified date for the existing data in the same bucket partition of the destination; if the modified date in the source bucket partition is changed then this source partition must replace the destination bucket partition.
The first or initial bucket partition to get from the source should be the first one deriving from what specified for "extract data from the past" settings.
Thanks
Hi @pmscorca ,
Thanks for sharing your incremental refresh settings. Let's break down your points to clarify the concepts further.
Best Regards,
Adamk Kong
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Hey pmscorca!
I see you have some questions about buckets and how to use them.
For incremental refresh you need to configure a couple of settings to get going.
First setting is to set the column you want to use to split the data into buckets with.("DateTime column to filter by") So in a case you have a orders table, you a probably want to use the order date.
With the buckets settings you can now define the range of data you want in scope. After the time range you decide how to split it into multiple buckets and chose either day, week, month, quarter or year.
For "Only extract new data when the maximum value in this column changes" is the column we use to detect if there is new data available for a specific bucket and line it up for refresh.
To learn more we have a detailed explaination here:
Incremental refresh in Dataflow Gen2 - Microsoft Fabric | Microsoft Learn
Hi, thanks for your reply. I've opened this post after reading the related documentation and doing some proofs.
I'd like to understand better this feature.
I need to import some excel files as a source in order to write the related data in a Fabric warehouse.
As a first load, I've these data:
two orders at August 10th.
Inside the dataflow gen 2 I have these settings:
- Date to filter by = orderdate,
- Extract data from the past = 2 months,
- Bucket size = month,
- Only extract new data when the maximum value in this column changes = modifieddate.
Which are the splitted bucket ranges respect to the order date, in terms of start date and end date?
Thanks
Excel files are not a foldable source. Incremental refresh should only be used for large data, buckets should have at least 2 Million rows for this to make sense.
Hi, I'm trying with few rows to understand better the mechanism and however I've no issues using excel files.
Check out the April 2025 Fabric update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.