Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

We've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now

Reply
jusTodd
Advocate IV
Advocate IV

Group vs Table.Buffer

Which is the best long term solution here?

 

I have a demographics table for which I receive periodic updates to individual records.  I only want the most recently updated record for each individual though, when I pull this into Power BI.

 

Both of these accomplish the same task, and neither seem like much of a drain at this time, with only a million or so records.  That record count will balloon though.

 

First, I tried this bit, which uses Table.Buffer

Remove Duplicates and Keep the Last Record with Power Query ? (exceleratorbi.com.au)

 

and after reading a bit more, I accomplished the same feat using using Group By and Table.Max, as explained in this video.

How to filter a table to show only most recent date by group in Power Query (youtube.com)

 

I look forward to any and all thoughts from those more experienced.  🙂

 

 

... late addition ...

Here is another I stumbled on after reading the comment from another.  It uses List.Max

Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD - BI Insight

 

3 REPLIES 3
ZhangKun
Super User
Super User

The following are my personal thoughts.

The first solution, Table.Buffer is not a panacea. The method in the article is effective but dangerous. Because it will perform very poorly when the table is large, and may even have a counterproductive effect.

 

The second solution seems to have problems. When the amount of data is large, you can try the following methods: 1. Use Table.Group to group and only get the required grouping columns and the latest date 2. Use Inner Join to merge queries The reason for doing this is that the efficiency of Table.Select is very worrying.

 

In addition, when the data becomes more and more, you may use incremental refresh. But my personal understanding is that Power Query is more like a pipeline, not a data pool, and so much data should not be cached and calculated. If the power query query is slow to execute, you may also try to put it in DAX for calculation.

lbendlin
Super User
Super User

 for which I receive periodic updates to individual records. 

Power Query has no concept of that. The lowest you can go is to do "incremental refresh"  which allows you to refresh individual partitions, but still in a flush and fill way.

 

What you are after is differential refresh, CDC (Change Data Capture) or some variation of SCD (for example SCD2)

 

To repeat - Power BI by itself does not support any of that - It is you who will have to put in the work to manage the information in a SCD2 table , for example.

 

The poor man's solution is incremental refresh with a larger "hot"  window than what you think you need.  But SCD2 is probably the second best least worst option.

Thanks for this that lead me to Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD - BI Insight

 

This one uses List.Max and Merge, which seems to be useful if you want to continue to use "Column Distribution"   The YouTube option blows that away.

Helpful resources

Announcements
New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Power BI Update Carousel

Power BI Community Update - March 2026

Check out the March 2026 Power BI update to learn about new features.