Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hello everyone,
I have 2 significant questions. First of all, this will be long message, thanks for reading and comments.
I have an incremental refresh like below the image.
My first question. What is exactly bucket size is? I know bucket size from microsoft "This setting is required and specifies the size of the buckets that the dataflow uses to filter the data. The dataflow divides the data into buckets based on the DateTime column. Each bucket contains the data that changed since the last refresh. The bucket size determines how much data is processed in each iteration. A smaller bucket size means that the dataflow processes less data in each iteration, but it also means that more iterations are required to process all the data. A larger bucket size means that the dataflow processes more data in each iteration, but it also means that fewer iterations are required to process all the data."
BUT should I think like that below the image? Is my perspective true? For example I have 2020-2024 datas. But I extract 2 years data. so 2020.01- and 2022.10 data will be stored. 2022.11-2024.11 have buckets based on ModificationDate. İf it is incresing(all rows) all buckets will be refresh? And if todays date is be like 2027.12. In fabric the data will be 2022-2027? I mean in dataflow gen1 is different because there is a archive data.
below the image will be true perspective? İf ı'm wrong could you please explain with date numbers or images. Thank you.
And my second questions is about settings of "Only extract data for concluded periods". What does it work exactly? I know that means from microsoft "
This setting is optional and specifies whether the dataflow should only extract data for concluded periods. If this setting is enabled, the dataflow only extracts data for periods that concluded. So, the dataflow only extracts data for periods that are complete and don't contain any future data. If this setting is disabled, the dataflow extracts data for all periods, including periods that aren't complete and contain future data.
For example, if you have a DateTime column that contains the date of the transaction and you only want to refresh complete months, you can enable this setting in combinations with the bucket size of month. Therefore, the dataflow only extracts data for complete months and doesn't extract data for incomplete months."
BUT due to my tests it confused me more. What I mean that when the setting is closed is it fetch all data right? What if I have 2020-2026 year values(Today date= 11/06/2024). İt should be fetch all data. But it does not fetch. İt fetchs like (01/01/2020-11/30/2024). But the weird think is if the date above today day it should be sequence by 1. İf it is not incresing one by one it is not fetching.
How about settings of "Only extract data for concluded periods" open? What should I understand. As far as my understand it should be only (01/01/2020-11/30/2024) values. But again if the date above today day it should be sequence by 1. İf it is not incresing one by one it is not fetching.
Thanks for reading. Thanks for support.
Solved! Go to Solution.
Hi @fabconmvp
1. With "Only extract data for concluded periods" Enabled
Concept of Complete Periods: With a bucket size of "year," the complete period is a full year.
Refresh Behavior: The dataflow will bucket data by year and only refresh data for fully completed years.
Example on the Current Date (November 7, 2024):
Since 2024 is not yet finished, the dataflow will only refresh data for the most recent fully completed year 2023.
2024 data won’t be refreshed until the year ends.
So, the refresh will cover only data for 2023.
2. With "Only extract data for concluded periods" Disabled
Refresh Behavior: Even if the current year hasn’t concluded, the dataflow will still refresh data for the ongoing year.
Example on the Current Date (November 7, 2024):
The dataflow will refresh data for both 2023 and 2024 (up to today’s data), because with this setting disabled, the dataflow includes the ongoing, incomplete year.
Best Regards,
Jayleny
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @fabconmvp
For your first question, If you set a bucket size (e.g., Month or Year), the dataflow will create these “buckets” for periods of data based on the size you specified. When data refreshes, only buckets with data modified since the last refresh will be processed. For example:
With a Year bucket size, the data from 2022 to 2024 will be divided by year, and only the latest year's bucket will be refreshed if only new data is added for that year.
For your second question, the "Only extract data for concluded periods" option affects what data is processed during each refresh:
When Enabled: The dataflow will only refresh data from completed periods based on your bucket size. For instance, if your bucket size is a month, and today is November 6, 2024, the refresh will process only data from months that have fully concluded (up to October 2024). It won’t include partial data from November 2024, as the month is still ongoing.
When Disabled: The dataflow can include data from periods that are not fully concluded (e.g., partial data for the current month). However, it may not fetch all data if certain date sequences are not incrementing as expected (e.g., gaps in data or non-sequential dates within the bucket size). If, for instance, there’s an inconsistency in the DateTime sequence within your specified range, the dataflow may only fetch the most recent continuous dates up to the last concluded date (e.g., 01/01/2020 - 11/30/2024 in your case).
Best Regards,
Jayleny
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hello @v-jialongy-msft,
Thanks for comment, could give me example if ı have 2000-2050 year data(Today date 11/07/2024). How incremental refresh works with settings of "Only extract data for concluded periods" and without settings of "Only extract data for concluded periods". Also, could you explain bucket size too?(Extract data from past year = 2)
Hi @fabconmvp
1. With "Only extract data for concluded periods" Enabled
Concept of Complete Periods: With a bucket size of "year," the complete period is a full year.
Refresh Behavior: The dataflow will bucket data by year and only refresh data for fully completed years.
Example on the Current Date (November 7, 2024):
Since 2024 is not yet finished, the dataflow will only refresh data for the most recent fully completed year 2023.
2024 data won’t be refreshed until the year ends.
So, the refresh will cover only data for 2023.
2. With "Only extract data for concluded periods" Disabled
Refresh Behavior: Even if the current year hasn’t concluded, the dataflow will still refresh data for the ongoing year.
Example on the Current Date (November 7, 2024):
The dataflow will refresh data for both 2023 and 2024 (up to today’s data), because with this setting disabled, the dataflow includes the ongoing, incomplete year.
Best Regards,
Jayleny
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
34 | |
30 | |
18 | |
12 | |
8 |
User | Count |
---|---|
50 | |
35 | |
30 | |
15 | |
12 |