Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Which is the best long term solution here?
I have a demographics table for which I receive periodic updates to individual records. I only want the most recently updated record for each individual though, when I pull this into Power BI.
Both of these accomplish the same task, and neither seem like much of a drain at this time, with only a million or so records. That record count will balloon though.
First, I tried this bit, which uses Table.Buffer
Remove Duplicates and Keep the Last Record with Power Query ? (exceleratorbi.com.au)
and after reading a bit more, I accomplished the same feat using using Group By and Table.Max, as explained in this video.
How to filter a table to show only most recent date by group in Power Query (youtube.com)
I look forward to any and all thoughts from those more experienced. 🙂
... late addition ...
Here is another I stumbled on after reading the comment from another. It uses List.Max
Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD - BI Insight
The following are my personal thoughts.
The first solution, Table.Buffer is not a panacea. The method in the article is effective but dangerous. Because it will perform very poorly when the table is large, and may even have a counterproductive effect.
The second solution seems to have problems. When the amount of data is large, you can try the following methods: 1. Use Table.Group to group and only get the required grouping columns and the latest date 2. Use Inner Join to merge queries The reason for doing this is that the efficiency of Table.Select is very worrying.
In addition, when the data becomes more and more, you may use incremental refresh. But my personal understanding is that Power Query is more like a pipeline, not a data pool, and so much data should not be cached and calculated. If the power query query is slow to execute, you may also try to put it in DAX for calculation.
for which I receive periodic updates to individual records.
Power Query has no concept of that. The lowest you can go is to do "incremental refresh" which allows you to refresh individual partitions, but still in a flush and fill way.
What you are after is differential refresh, CDC (Change Data Capture) or some variation of SCD (for example SCD2)
To repeat - Power BI by itself does not support any of that - It is you who will have to put in the work to manage the information in a SCD2 table , for example.
The poor man's solution is incremental refresh with a larger "hot" window than what you think you need. But SCD2 is probably the second best least worst option.
Thanks for this that lead me to Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD - BI Insight
This one uses List.Max and Merge, which seems to be useful if you want to continue to use "Column Distribution" The YouTube option blows that away.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
14 | |
13 | |
12 | |
8 | |
8 |
User | Count |
---|---|
17 | |
10 | |
8 | |
7 | |
7 |