The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi all,
I'm facing to the slow performance in below query in DataFlow:
I am trying to calculate the days from the document lodged date until today by excluding weekend & public holidays.
I am not that an expert on Power Query M and actually below code is working. Only problem is too slow.
let
// Below is to get data from a specific table. This table has about 700K rows
// ----------------------------------------------------------------------------------
Source = Databricks.Catalogs(HostName, HttpPath, [Catalog = null, Database = null, EnableAutomaticProxyDiscovery = "enabled"]),
#"Navigation 1" = Source{[Name = Catalog, Kind = "Database"]}[Data],
#"Navigation 2" = #"Navigation 1"{[Name = "catalog", Kind = "Schema"]}[Data],
_table = #"Navigation 2"{[Name = "table", Kind = "Table"]}[Data],
// Below extracts all weekend and public holidays from the Calendar table and it contains about 32K rows
// of weekend and public holidays
_weekendOrHoliday = Table.SelectColumns(Table.SelectRows(Calendar, each [IsWeekend] = 1 or [IsPublicHoliday] = 1), "Primary_key_date"),
// Just get today to calculate days until today
_toDate = DateTime.Date(DateTime.LocalNow()),
_result = Table.AddColumn(_table, "DaysUntilToday",
each (
if [Date_Lodged] = null then 0
else (
let
_fromDate = DateTime.Date([Date_Lodged]),
_days = Duration.Days(_toDate - _fromDate),
// Below is to get the number of weekend and public holidays and here is the problem.
// It took too much time.. sometimes the whole script runs more than 2 hours due to below.
// --------------------------------------------------------------------------------------
_minus = Table.RowCount(
Table.SelectRows(_weekendOrHoliday,
each(
[Primary_key_date] > _fromDate and
[Primary_key_date] <= _toDate
)
)
)
in
_days - _minus
)
)
)
in
_result
I would appreciate if you have a look at the code and suggest anything to be enhanced in the aspect of the performance.
Thanks.
Solved! Go to Solution.
I am still trying to reduce the performance time and one of my efforts was to separate the calculation logic of weekend and public holidays from queries and it reduced the time to run so far.
I am not sure why but after I separated the calculation logic to a function and calling that from queries, performance has been enhanced.
Outstanding = Table.AddColumn(_decided, "DaysUntilToday",
each (
if [Lodged_date] = null then 0
else (
fx_calBusinessDays(DateTime.Date([Lodged_date]), _toDate)
)
)
),
When I tested with the original code with fully implemented 25 queries, it runs about / more than 2 hours but now it runs about an hour. Thanks.
I am still trying to reduce the performance time and one of my efforts was to separate the calculation logic of weekend and public holidays from queries and it reduced the time to run so far.
I am not sure why but after I separated the calculation logic to a function and calling that from queries, performance has been enhanced.
Outstanding = Table.AddColumn(_decided, "DaysUntilToday",
each (
if [Lodged_date] = null then 0
else (
fx_calBusinessDays(DateTime.Date([Lodged_date]), _toDate)
)
)
),
When I tested with the original code with fully implemented 25 queries, it runs about / more than 2 hours but now it runs about an hour. Thanks.
Hi,
Thanks for the solution @shafiz_p offered, and i want to offer some more informaiton for user to refer to.
hello @AlvinB , based on the code you have offered, you have about 32k rowa about the holiday, it is too large, maybe you can consider to reduce the amount of data in the calendar table, such as extracting only the data of the last one or two years, and extract the date table data of the most recent date commemoration according to the date you need now, so as to reduce the size of the calendar table data and improve efficiency.
And you can use diagnostics to diagnose which step is running the longest so that the step can be optimized more accurately
You can refer to the following link about it.
Query diagnostics - Power Query | Microsoft Learn
Best Regards!
Yolo Zhu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @AlvinB
Instead of calculating the number of weekends and public holidays for each row individually, you can precompute these counts for a range of dates and store them in a separate table. Then, you can join this precomputed table with your main table.
Power Query’s list functions can be more efficient than table operations. You can try to use List.Dates and List.Count to count the number of weekends and public holidays.
Try this:
let
// Get data from a specific table
Source = Databricks.Catalogs(HostName, HttpPath, [Catalog = null, Database = null, EnableAutomaticProxyDiscovery = "enabled"]),
#"Navigation 1" = Source{[Name = Catalog, Kind = "Database"]}[Data],
#"Navigation 2" = #"Navigation 1"{[Name = "catalog", Kind = "Schema"]}[Data],
_table = #"Navigation 2"{[Name = "table", Kind = "Table"]}[Data],
// Extract all weekend and public holidays from the Calendar table
_weekendOrHoliday = Table.SelectColumns(Table.SelectRows(Calendar, each [IsWeekend] = 1 or [IsPublicHoliday] = 1), "Primary_key_date"),
_weekendOrHolidayList = List.Buffer(Table.Column(_weekendOrHoliday, "Primary_key_date")),
// Get today to calculate days until today
_toDate = DateTime.Date(DateTime.LocalNow()),
_result = Table.AddColumn(_table, "DaysUntilToday",
each (
if [Date_Lodged] = null then 0
else (
let
_fromDate = DateTime.Date([Date_Lodged]),
_days = Duration.Days(_toDate - _fromDate),
_dateRange = List.Dates(_fromDate, _days, #duration(1, 0, 0, 0)),
_minus = List.Count(List.Intersect({_dateRange, _weekendOrHolidayList}))
in
_days - _minus
)
)
)
in
_result
Hope this helps!!
If this solved your problem, please accept it as a solution and a kudos!!
Best Regards,
Shahariar Hafiz
This morning, I was running the dataflow and this dataflow contains 25 queries and one of them contains the script I've posted. I was refreshing the dataflow 2 times. One with the original script and the other with the script you've suggested. Unfortunately, my dataflow runs faster with the original script rather than LIST function. With table select function, it runs about 25 mins otherwise about 40 mins with LIST function. The problem is I need to implement this code to several queries and this is why I need to enhnace my code. Thanks.