Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
emarc1
Advocate II
Advocate II

Most optimised way to merge data into a calendar data at scale

I have historic data for stock movements in a warehouse. Each time an SKU is produced/moved/sold, there is a log. Some items have data for most days, but others can only move once every few weeks. I've taken the last movement for each item on each day (let's call this 'item-day' data), and merged this with a blank calendar table, then filled the blank days with the values from the previous days. The problem is that if there's 10,000 items, and a calendar for each, it results in 3.65M rows. Power BI handles the rows fine, but Power Query starts to struggle with the nesting and multi-column merges. There's around 600k rows of item-day data that need merging into those 3.65M rows... That's slow. But when you go up to 5 years, that's 3M rows into 18.25M. It's too exponential to scale up.

 

Are there more optimised ways to fill in the data between the dates?

  • Are there more optimised ways of filling in the days with no data, without just merging each item with the full list of dates?
  • Should I be using DAX to do the calendar merge bits instead?
  • Would it help to partition the query so it just does each item one at a time and then append all of them (so it only needs to match several rows to 365 date rows, 10,000 times, rather than trying to match half a million or so rows to 3.65M other rows one time)?

I've already tried to speed up the merges using JoinAlgorithm.SortMerge, but unfortunately the original SQL data is in EBCDIC sort order, rather than ASCII so I need to sort the data in PQ first anyway.

2 REPLIES 2
ThxAlot
Super User
Super User

It seems weird for me to merge data into a calendar table.

I tweaked one of my previous reports to show you the fundamental pattern to deal with such a situation.

ThxAlot_0-1704394925374.png



Expertise = List.Accumulate(


        {Days as from Today},


        {Skills and Knowledge},


        (Current, Everyday) => Current & Day.LeanAndPractise(Everyday)


)



Putting it into a calendar certainly makes it easier to do a lot of the calcs. I originally made this dataset so I could see the total value of all our stocked items through the year and flick between different item groups. Makes it super easy to chart and slice in that more data warehouse setup. I'll try your method out, though I think it'll have to run 365 times over the 600k rows of non-calendar merged data. 219M processes to do on the fly, rather than 3.25M, to make a chart line like in the one below:

Capture.PNG

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.