Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
D1274
Regular Visitor

Data loading does not complete when using custom functions

I am trying to add a custom function in PowerQuery to calculate the difference between the specified delivery date and the actual delivery date, taking into account holidays, but when I apply the query, it shows abnormal behavior when loading data.

  

Data

PAY_INFO_List.csv: delivery data

Calendar.csv: holiday data

 

Custom Functionslike NETWORKDAYS in Excel

(START_DATE as date, END_DATE as date, HOLIDAY_LIST as table) as number =>

 

let

 

  START_TO_END = List.Generate( () => List.Min({START_DATE,END_DATE}), each _ <= List.Max({START_DATE,END_DATE}), each Date.AddDays(_,1)),

 

  START_TO_END2 = List.Select(START_TO_END, each Date.DayOfWeek(_, Day.Monday) <= 4), //土日の削除

 

  BUSSINES_DAYS = List.Count( List.Difference( START_TO_END2, Table.Column(HOLIDAY_LIST,"休日") )) - 1, //START_DATE=END_DATEならゼロになる

 

  BUSSINES_DAYS2 = if START_DATE <= END_DATE then BUSSINES_DAYS else BUSSINES_DAYS * (-1)

 

in

 

  BUSSINES_DAYS2

 

Query "PAY_INFO_List"

Load PAY_INFO_List.csv and set as below step of the custom function

 START_DATE: "Delivery date" column

 END_DATE: "Actual delivery date" column

 HOLIDAY_LIST: query "Calendar" using Calendar.csv as source

 

After creating a query as above and select "Close and Apply", the data starts to be loaded,

but Calendar.csv is displayed as the source of PAY_INFO_List as shown below (to be precise, "from Calendar.csv" is displayed for most of the time, and "from PAY_INFO_List" is displayed for a moment only occasionally),

and the loaded size continues to increase without finishing the loading.

 

Display of Load

PAY_INFO_List

584MB from Calendar.csv ←This read size display keeps increasing (actual data size is 430MB)

 

Calendar

1.93KB from Calendar.csv

  スクリーンショット 2023-03-01 110211.png

 

 

 

 

 

 

 

What I tried

Eliminate the process of referencing the holiday list on the custom function → Loading was completed without any problems (however, this did not allow calculation of holidays).

Change the location where Calendar.csv is saved → No change in the phenomenon.

 

Version

Microsoft power BI Desktop

Version: 2.114.664.0 64-bit (February 2023)

 

 

This is the first time for me to use custom functions, so there may be a fundamental error.

I would appreciate it if you could let me know the cause of the problem and how to deal with it.

1 ACCEPTED SOLUTION

 

No, CSV combining isn't a problem per se, but it does depend on whether you mean appending or merging. Appending is a pretty resource-light operation, whereas merging is a whole-table operation so is incredibly resource-heavy.

 

I've done some more tests closer to your scenario, but used dataflow sources for both the calendar and the fact table. Here's what I got:

 

- 1,167 row calendar + ~900k row fact with 22 cols including working days col: avg apply time 3:53

- Same as above just removed the working days col: avg 0:47

- Same as above, but removing 10 text columns but keeping working days: avg 3:36

 

This won't surprise you as it's clearly in line with your observations. However, this has been done with 48GB RAM. Around 1.7GB was allocated to the PBI Desktop app instance, and around 1GB was allocated for containers, of which around 700MB was allocated to SSAS. As such, your times may be significantly higher than this running on 8GB overall.

 

The calculated column approach with a buffered calendar list has removed the multiple-scan issue, but I don't think we can avoid this slowing down the overall throughput rate any further. As such, and considering you have double the amount of rows based on combining 4 CSV files, I'd be looking to offload this process out of PBI Desktop. If you want to refresh your report daily, I'd recommend using Dataflows to do the processing overnight, then just bring the completed Dataflow into your report.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

10 REPLIES 10
D1274
Regular Visitor

Thank you @BA_Pete  for your advise, but unfortunately the problem wasn't solved as below...

 

D1274_0-1677666412000.png

 

 

Ok, let's go a different way. Forget your function.

 

Add a custom step to your PAY_INFO_List table like this:

bufferHolList = List.Buffer(Calendar[休日]),

 ...and, after the custom step, add this as a new custom column on your PAY_INFO_List table:

 

let
    startDate = List.Min({[Delivery date], [Actual delivery date]}),
    endDate = List.Max({[Delivery date], [Actual delivery date]}),
    listDates =
        List.Dates(
            startDate,
            Number.From(endDate - startDate) + 1,
            #duration(1,0,0,0)
        ),
    removeHols =
        List.Difference(
            listDates,
            List.Transform(bufferHolList, Date.From)
        ),
    removeWeekends =
        List.Select(
            removeHols,
            each Date.DayOfWeek(_, Day.Monday) < 5
        ),
    countDays = List.Count(removeWeekends)
in
    countDays

 

 

You'll probably want to add your polarity switch back on at the end if you can have an actual delivery date before the delivery date, but this should be pretty performant either way.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




The display of Load recovered , but loading speed got too late.

It's less than 1kb/s.

 

D1274_0-1677734881198.png

 

 

Can you let me know how many rows you have in your calendar table, how many rows and columns are in your PAY_INFO_List table, and how much RAM you have on the machine you're running this on please?

I'll set up a test query matching your scenario that you can run to see if it's the query causing the problem or whether it's your source connection.

 

--Update--

I've just run this at my end using an 87 row calendar table and a 177,651 row, 3 column, fact table.

The column generated in a fraction a second and the query applied to the model in about 14 seconds with no rescans.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




PAY_INFO_List(convined 4 csv files) : 2766078 rows, 22 columns

calender table: 145rows

RAM: 8GB

 

Ok. I think it's the combining of the CSV files that's probably taking the time then. You'll see I've added an update to my previous post where I've done a fairly large test. Although it's not as large as your queries, it should give a fairly good indicator of whether there's major issues with the query.

I'll try to run a test with similar numbers to your queries later, but the CSV acquisition and combine process is, to mind mind, probably the issue.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




As I wrote in 【What I tried】,  Loading was completed without any problems when I eliminated the process of referencing the holiday list on the custom function. ( step of combining CSV is existing)

It took just several minutes to load.

 

Despite this, is combining CSV a problem?

 

No, CSV combining isn't a problem per se, but it does depend on whether you mean appending or merging. Appending is a pretty resource-light operation, whereas merging is a whole-table operation so is incredibly resource-heavy.

 

I've done some more tests closer to your scenario, but used dataflow sources for both the calendar and the fact table. Here's what I got:

 

- 1,167 row calendar + ~900k row fact with 22 cols including working days col: avg apply time 3:53

- Same as above just removed the working days col: avg 0:47

- Same as above, but removing 10 text columns but keeping working days: avg 3:36

 

This won't surprise you as it's clearly in line with your observations. However, this has been done with 48GB RAM. Around 1.7GB was allocated to the PBI Desktop app instance, and around 1GB was allocated for containers, of which around 700MB was allocated to SSAS. As such, your times may be significantly higher than this running on 8GB overall.

 

The calculated column approach with a buffered calendar list has removed the multiple-scan issue, but I don't think we can avoid this slowing down the overall throughput rate any further. As such, and considering you have double the amount of rows based on combining 4 CSV files, I'd be looking to offload this process out of PBI Desktop. If you want to refresh your report daily, I'd recommend using Dataflows to do the processing overnight, then just bring the completed Dataflow into your report.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




I understood that the calculation was too heavy.

I found NETWORKDAYS function in DAX, and I tried deleting custom function step in PowerQuery and adding NETWORDAYS column on PowerBI Data view.

As a result, NETWORKDAYS culculation was completed in a short time.

 

It was a mistake to culculate NETWORKDAYS in PowerQuery.

Thank you for support @BA_Pete .

BA_Pete
Super User
Super User

Hi @D1274 ,

 

The first thing I'd do is buffer the calendar to memory and see how that goes.

On the line after the 'let' in your function, add this line:

bufferHolList = Table.Buffer(HOLIDAY_LIST),

After that, change any references in your function from HOLIDAY_LIST to bufferHolList and see how you get on.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.