The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi,
I have a fairly large query with 1,5 million rows, which is growing by around 5000 rows per day. From that query I want to extract the distinct values from one specific column and create a new table with those values, so I can use it as a dimension and for forming relationships in my model.
Right now, I have referenced the first query, deleted all the columns besides the one I need, and then done a "Remove Duplicates", with a result of two distinct text values. This works, but at query refresh, this simple table containing only two text values as a final result will load all 1,5 million rows first, and then clean it up.
Is there a more efficient way to complete the same task in Power Query, so I can reduce refresh time?
Solved! Go to Solution.
Hi @Anonymous,
You can try to create a blank query to reference original source, then use list.distinct to remove duplicate records.
let DistinctSource = List.Distinct(Sheet2[Date])// List.Distinct(QueryName[ColumnName]) in DistinctSource
Sheet2(22582 rows) -> Date(953 rows)
Regards,
Xiaoxin Sheng
I struggled for a while with this becuase there is some key puntuation missing from this solution. Here's what worked for me - I am refrenceing another query
= List.Distinct(#"QueryName"[ColumnName])
Key things missing was having the # and "" in the right spot
In the Power Query editor:
Right click on your existing query and choose "Reference".
Use "Choose Columns" to select only the column which you want your unique values generated from.
Right click on the column and choose "Remove Duplicates"
I like this solution because it allows you to continue to use table tools and create new custom column off the de-duplicated list. Power BI Guy's solution may be more effecient however.
This is the users current solution, however this is inefficient as it loads the entire table and then performs the transformations.
Current best solution is:
let DistinctSource = List.Distinct(Sheet2[Date])// List.Distinct(QueryName[ColumnName]) in DistinctSource
Your code is slightly wrong. I've edited it below 🙂
let DistinctSource = List.Distinct(Sheet2,"Date")// List.Distinct(QueryName,"ColumnName") in DistinctSource
Thanks! This is just the alternative I was looking for.
Hello
I faced the same problem you asked for and i am quite new to power bi and mostly know its basics.
what i did was i created a new table in power bi desktop and used the formula
Hi @Anonymous,
You can try to create a blank query to reference original source, then use list.distinct to remove duplicate records.
let DistinctSource = List.Distinct(Sheet2[Date])// List.Distinct(QueryName[ColumnName]) in DistinctSource
Sheet2(22582 rows) -> Date(953 rows)
Regards,
Xiaoxin Sheng
This works. But the query still refreshes all the 1.5 million records! So it does not solve the question.
That means you still have to wait for the query to finish once more. This is made like this to "make shure" Power-Bi have all the data once more pulled from the souse. Quite annoying actually.
I have a question for this. I am trying to do this myself:
you write // List.Distinct(QueryName[ColumnName]), is this the new query you're working in?
If not, what is it?
Hi @BiBra,
Nope, it just means the description of my formula.
Regards,
Xiaoxin Sheng
I get this:
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
80 | |
78 | |
47 | |
39 |
User | Count |
---|---|
148 | |
115 | |
65 | |
64 | |
53 |