Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
jusTodd
Advocate IV
Advocate IV

Group vs Table.Buffer

Which is the best long term solution here?

 

I have a demographics table for which I receive periodic updates to individual records.  I only want the most recently updated record for each individual though, when I pull this into Power BI.

 

Both of these accomplish the same task, and neither seem like much of a drain at this time, with only a million or so records.  That record count will balloon though.

 

First, I tried this bit, which uses Table.Buffer

Remove Duplicates and Keep the Last Record with Power Query ? (exceleratorbi.com.au)

 

and after reading a bit more, I accomplished the same feat using using Group By and Table.Max, as explained in this video.

How to filter a table to show only most recent date by group in Power Query (youtube.com)

 

I look forward to any and all thoughts from those more experienced.  🙂

 

 

... late addition ...

Here is another I stumbled on after reading the comment from another.  It uses List.Max

Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD - BI Insight

 

3 REPLIES 3
ZhangKun
Super User
Super User

The following are my personal thoughts.

The first solution, Table.Buffer is not a panacea. The method in the article is effective but dangerous. Because it will perform very poorly when the table is large, and may even have a counterproductive effect.

 

The second solution seems to have problems. When the amount of data is large, you can try the following methods: 1. Use Table.Group to group and only get the required grouping columns and the latest date 2. Use Inner Join to merge queries The reason for doing this is that the efficiency of Table.Select is very worrying.

 

In addition, when the data becomes more and more, you may use incremental refresh. But my personal understanding is that Power Query is more like a pipeline, not a data pool, and so much data should not be cached and calculated. If the power query query is slow to execute, you may also try to put it in DAX for calculation.

lbendlin
Super User
Super User

 for which I receive periodic updates to individual records. 

Power Query has no concept of that. The lowest you can go is to do "incremental refresh"  which allows you to refresh individual partitions, but still in a flush and fill way.

 

What you are after is differential refresh, CDC (Change Data Capture) or some variation of SCD (for example SCD2)

 

To repeat - Power BI by itself does not support any of that - It is you who will have to put in the work to manage the information in a SCD2 table , for example.

 

The poor man's solution is incremental refresh with a larger "hot"  window than what you think you need.  But SCD2 is probably the second best least worst option.

Thanks for this that lead me to Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD - BI Insight

 

This one uses List.Max and Merge, which seems to be useful if you want to continue to use "Column Distribution"   The YouTube option blows that away.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors