Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
I am trying to add a custom function in PowerQuery to calculate the difference between the specified delivery date and the actual delivery date, taking into account holidays, but when I apply the query, it shows abnormal behavior when loading data.
【Data】
PAY_INFO_List.csv: delivery data
Calendar.csv: holiday data
【Custom Functions】 like NETWORKDAYS in Excel
(START_DATE as date, END_DATE as date, HOLIDAY_LIST as table) as number =>
let
START_TO_END = List.Generate( () => List.Min({START_DATE,END_DATE}), each _ <= List.Max({START_DATE,END_DATE}), each Date.AddDays(_,1)),
START_TO_END2 = List.Select(START_TO_END, each Date.DayOfWeek(_, Day.Monday) <= 4), //土日の削除
BUSSINES_DAYS = List.Count( List.Difference( START_TO_END2, Table.Column(HOLIDAY_LIST,"休日") )) - 1, //START_DATE=END_DATEならゼロになる
BUSSINES_DAYS2 = if START_DATE <= END_DATE then BUSSINES_DAYS else BUSSINES_DAYS * (-1)
in
BUSSINES_DAYS2
【Query "PAY_INFO_List"】
Load PAY_INFO_List.csv and set as below step of the custom function
START_DATE: "Delivery date" column
END_DATE: "Actual delivery date" column
HOLIDAY_LIST: query "Calendar" using Calendar.csv as source
After creating a query as above and select "Close and Apply", the data starts to be loaded,
but Calendar.csv is displayed as the source of PAY_INFO_List as shown below (to be precise, "from Calendar.csv" is displayed for most of the time, and "from PAY_INFO_List" is displayed for a moment only occasionally),
and the loaded size continues to increase without finishing the loading.
【Display of Load】
PAY_INFO_List
584MB from Calendar.csv ←This read size display keeps increasing (actual data size is 430MB)
Calendar
1.93KB from Calendar.csv
【What I tried】
・Eliminate the process of referencing the holiday list on the custom function → Loading was completed without any problems (however, this did not allow calculation of holidays).
・Change the location where Calendar.csv is saved → No change in the phenomenon.
【Version】
Microsoft power BI Desktop
Version: 2.114.664.0 64-bit (February 2023)
This is the first time for me to use custom functions, so there may be a fundamental error.
I would appreciate it if you could let me know the cause of the problem and how to deal with it.
Solved! Go to Solution.
No, CSV combining isn't a problem per se, but it does depend on whether you mean appending or merging. Appending is a pretty resource-light operation, whereas merging is a whole-table operation so is incredibly resource-heavy.
I've done some more tests closer to your scenario, but used dataflow sources for both the calendar and the fact table. Here's what I got:
- 1,167 row calendar + ~900k row fact with 22 cols including working days col: avg apply time 3:53
- Same as above just removed the working days col: avg 0:47
- Same as above, but removing 10 text columns but keeping working days: avg 3:36
This won't surprise you as it's clearly in line with your observations. However, this has been done with 48GB RAM. Around 1.7GB was allocated to the PBI Desktop app instance, and around 1GB was allocated for containers, of which around 700MB was allocated to SSAS. As such, your times may be significantly higher than this running on 8GB overall.
The calculated column approach with a buffered calendar list has removed the multiple-scan issue, but I don't think we can avoid this slowing down the overall throughput rate any further. As such, and considering you have double the amount of rows based on combining 4 CSV files, I'd be looking to offload this process out of PBI Desktop. If you want to refresh your report daily, I'd recommend using Dataflows to do the processing overnight, then just bring the completed Dataflow into your report.
Pete
Proud to be a Datanaut!
Ok, let's go a different way. Forget your function.
Add a custom step to your PAY_INFO_List table like this:
bufferHolList = List.Buffer(Calendar[休日]),
...and, after the custom step, add this as a new custom column on your PAY_INFO_List table:
let
startDate = List.Min({[Delivery date], [Actual delivery date]}),
endDate = List.Max({[Delivery date], [Actual delivery date]}),
listDates =
List.Dates(
startDate,
Number.From(endDate - startDate) + 1,
#duration(1,0,0,0)
),
removeHols =
List.Difference(
listDates,
List.Transform(bufferHolList, Date.From)
),
removeWeekends =
List.Select(
removeHols,
each Date.DayOfWeek(_, Day.Monday) < 5
),
countDays = List.Count(removeWeekends)
in
countDays
You'll probably want to add your polarity switch back on at the end if you can have an actual delivery date before the delivery date, but this should be pretty performant either way.
Pete
Proud to be a Datanaut!
The display of Load recovered , but loading speed got too late.
It's less than 1kb/s.
Can you let me know how many rows you have in your calendar table, how many rows and columns are in your PAY_INFO_List table, and how much RAM you have on the machine you're running this on please?
I'll set up a test query matching your scenario that you can run to see if it's the query causing the problem or whether it's your source connection.
--Update--
I've just run this at my end using an 87 row calendar table and a 177,651 row, 3 column, fact table.
The column generated in a fraction a second and the query applied to the model in about 14 seconds with no rescans.
Pete
Proud to be a Datanaut!
PAY_INFO_List(convined 4 csv files) : 2766078 rows, 22 columns
calender table: 145rows
RAM: 8GB
Ok. I think it's the combining of the CSV files that's probably taking the time then. You'll see I've added an update to my previous post where I've done a fairly large test. Although it's not as large as your queries, it should give a fairly good indicator of whether there's major issues with the query.
I'll try to run a test with similar numbers to your queries later, but the CSV acquisition and combine process is, to mind mind, probably the issue.
Pete
Proud to be a Datanaut!
As I wrote in 【What I tried】, Loading was completed without any problems when I eliminated the process of referencing the holiday list on the custom function. ( step of combining CSV is existing)
It took just several minutes to load.
Despite this, is combining CSV a problem?
No, CSV combining isn't a problem per se, but it does depend on whether you mean appending or merging. Appending is a pretty resource-light operation, whereas merging is a whole-table operation so is incredibly resource-heavy.
I've done some more tests closer to your scenario, but used dataflow sources for both the calendar and the fact table. Here's what I got:
- 1,167 row calendar + ~900k row fact with 22 cols including working days col: avg apply time 3:53
- Same as above just removed the working days col: avg 0:47
- Same as above, but removing 10 text columns but keeping working days: avg 3:36
This won't surprise you as it's clearly in line with your observations. However, this has been done with 48GB RAM. Around 1.7GB was allocated to the PBI Desktop app instance, and around 1GB was allocated for containers, of which around 700MB was allocated to SSAS. As such, your times may be significantly higher than this running on 8GB overall.
The calculated column approach with a buffered calendar list has removed the multiple-scan issue, but I don't think we can avoid this slowing down the overall throughput rate any further. As such, and considering you have double the amount of rows based on combining 4 CSV files, I'd be looking to offload this process out of PBI Desktop. If you want to refresh your report daily, I'd recommend using Dataflows to do the processing overnight, then just bring the completed Dataflow into your report.
Pete
Proud to be a Datanaut!
I understood that the calculation was too heavy.
I found NETWORKDAYS function in DAX, and I tried deleting custom function step in PowerQuery and adding NETWORDAYS column on PowerBI Data view.
As a result, NETWORKDAYS culculation was completed in a short time.
It was a mistake to culculate NETWORKDAYS in PowerQuery.
Thank you for support @BA_Pete .
Hi @D1274 ,
The first thing I'd do is buffer the calendar to memory and see how that goes.
On the line after the 'let' in your function, add this line:
bufferHolList = Table.Buffer(HOLIDAY_LIST),
After that, change any references in your function from HOLIDAY_LIST to bufferHolList and see how you get on.
Pete
Proud to be a Datanaut!
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!