Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi all,
struggling with Incremental refresh behavior again...
Part of the query that acts weirdly:
#"GroupedData" = Table.Group(#"Removed Columns", {"customer", "fiscper"},
{
{"MaxCalday", each List.Max([calday]), type date}
}
),
#"MergedData" = Table.Join(#"Removed Columns", {"customer", "fiscper", "calday"}, GroupedData, {"customer", "fiscper", "MaxCalday"}),
In PBI Desktop works properly:
But once published, in PBI Service from some reason feb 26th appears:
This does not happen a lot but happens enough that the report is not correct.
Is there something I am missing?
Acts the same way whether I set the refresh to last 10 days or last 1 month or last 4 months.
Thank you!
I don't see the Incremental Refresh connection to your query. Please clarify.
Sorry, missed to explain that part.
When I publish the report without incremental refresh setup (so, just whatever is in the startdate-enddate range), it works and calculates properly.
The random surpluss dates show when I publish the report with incremental refresh, this setup:
Data is sourced from Azure Databricks, if it makes difference.
Show the part of your Power Query that applies the filters.
NOTE: RangeStart must be inclusive, RangeEnd must be exclusive (or the other way round). They cannot both be inclusive.
Fixed that yesterday 🙂
Here's the whole code; I've just manually replaced some stuff like names etc so please ignore if I made some typo. Full structure is here
let
Source = Databricks.Catalogs("some_numbers.azuredatabricks.net", "/sql/1.0/warehouses/some_code", [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null]),
hive_metastore_Database = Source{[Name="hive_metastore",Kind="Database"]}[Data],
schema_name_Schema = hive_metastore_Database{[Name="schema_name",Kind="Schema"]}[Data],
table_name_Table = schema_name_Schema{[Name="table_name",Kind="Table"]}[Data],
#"Incremental Refresh Filter" = Table.SelectRows(table_name_Table, each [calday] >= Date.From(RangeStart) and [calday] < Date.From(RangeEnd)),
#"Filtered Rows1" = Table.SelectRows(#"Incremental Refresh Filter", each [something] = "code1" or [something] = "code2"),
#"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows1",{{"material", Int64.Type}, {"aedat", type datetime}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{columnnames}),
#"GroupedData" = Table.Group(#"Removed Columns", {"customer", "fiscper"},
{
{"MaxCalday", each List.Max([calday]), type date}
}
),
#"MergedData" = Table.Join(#"Removed Columns", {"customer", "fiscper", "calday"}, GroupedData, {"customer", "fiscper", "MaxCalday"}),
#"GroupedData2" = Table.Group(#"MergedData", {"customer", "MaxCalday"},
{
{"MaxId", each List.Max([some_id]), type text}
}
),
#"MergedData2" = Table.Join(#"MergedData", {"customer", "MaxCalday", "some_id"}, GroupedData2, {"customer", "MaxCalday", "MaxId"}),
in
#"MergedData2"
each [calday] >= Date.From(RangeStart) and [calday] < Date.From(RangeEnd)),
Flip this around to say
each DateTime.From([calday]) >= RangeStart and DateTime.From([calday]) < RangeEnd),
Unfortunately it's still the same...
I did a bit of additional troubleshooting and figured out the cause for the issue, I just need a help to figure out the solution now.
First, I realized that all of the "surpluss" dates appearing are always part of the calendar month that falls into the next fiscper - I hope an example below explains it.
While my incremental refresh was set to refresh 3 months - I had 12 "surpluss" dates.
I switched refresh to 90 days - 192 "surpluss" dates.
I switched to 1 Quarter - 0 "surpluss" dates (but the month I'm looking at is March, so no really mismatch between quarter per fiscper and quarter per regular calendar.
Based on this I assume that pbi service will take one by one whatever interval is selected in the incremental refresh (day, month, quarter, year), perform the calculation and append to previously calculated.
Ideally it would take one by one fiscper (not month or similar) but I understand that's not possible, so is there some way to be 100% sure that there will be no surpluss dates?
The only thing that comes to my mind is to refresh full dataset each day which kind of negates the point of incremental refresh.
To make matters worse - Incremental Refresh partitions are based on regular calendar dates, not fiscal calendar dates.
We ended up introducing a fake "Partition Date" datetime column in our data to translate our fiscal periods into the desired partitions.
Can you clarify this "Partition Date" column? How do you direct incremental refresh to base the partition on dates in that column and not on the other date column?
Let's say our fiscal year goes from September to August. I want to refresh partitions by fiscal year. So I introduce a Partition Date that is shifted four months, so that the "2023" partition now actually contains data for FY2023 rather than calendar year 2023.
There is an interesting side effect to this - Standard Incremental Refresh can't handle future dates. You need to use XMLA for that, but you are already using it anyway when you are manually refreshing partitions.