Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Anonymous
Not applicable

Power Query: Extract distinct values from column as new query

Hi,

 

I have a fairly large query with 1,5 million rows, which is growing by around 5000 rows per day. From that query I want to extract the distinct values from one specific column and create a new table with those values, so I can use it as a dimension and for forming relationships in my model.

 

Right now, I have referenced the first query, deleted all the columns besides the one I need, and then done a "Remove Duplicates", with a result of two distinct text values. This works, but at query refresh, this simple table containing only two text values as a final result will load all 1,5 million rows first, and then clean it up.

 

Is there a more efficient way to complete the same task in Power Query, so I can reduce refresh time?

1 ACCEPTED SOLUTION
v-shex-msft
Community Support
Community Support

Hi @Anonymous,

 

You can try to create a blank query to reference original source, then use list.distinct to remove duplicate records.

 

let
    DistinctSource = List.Distinct(Sheet2[Date])// List.Distinct(QueryName[ColumnName])
in
    DistinctSource

 

Sheet2(22582 rows) -> Date(953 rows)

9.PNG

 

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

View solution in original post

13 REPLIES 13
Stephen_Sink
Frequent Visitor

I struggled for a while with this becuase there is some key puntuation missing from this solution. Here's what worked for me - I am refrenceing another query

= List.Distinct(#"QueryName"[ColumnName])



Key things missing was having the # and "" in the right spot

HairyDrumroll
Advocate I
Advocate I

In the Power Query editor:

Right click on your existing query and choose "Reference".

Use "Choose Columns" to select only the column which you want your unique values generated from.

Right click on the column and choose "Remove Duplicates"

I like this solution because it allows you to continue to use table tools and create new custom column off the de-duplicated list. Power BI Guy's solution may be more effecient however.

This is the users current solution, however this is inefficient as it loads the entire table and then performs the transformations.

 

Current best solution is:

 

let
    DistinctSource = List.Distinct(Sheet2[Date])// List.Distinct(QueryName[ColumnName])
in
    DistinctSource

Your code is slightly wrong. I've edited it below 🙂

 

let
    DistinctSource = List.Distinct(Sheet2,"Date")// List.Distinct(QueryName,"ColumnName")
in
    DistinctSource

 

Thanks! This is just the alternative I was looking for.

Anonymous
Not applicable

Hello

I faced the same problem you asked for and i am quite new to power bi and mostly know its basics.

what i did was i created a new table in power bi desktop and  used the formula

 

NewTable = DISTINCT(Tablename [field])
 
and it worked.
v-shex-msft
Community Support
Community Support

Hi @Anonymous,

 

You can try to create a blank query to reference original source, then use list.distinct to remove duplicate records.

 

let
    DistinctSource = List.Distinct(Sheet2[Date])// List.Distinct(QueryName[ColumnName])
in
    DistinctSource

 

Sheet2(22582 rows) -> Date(953 rows)

9.PNG

 

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

This works. But the query still refreshes all the 1.5 million records! So it does not solve the question.

That means you still have to wait for the query to finish once more. This is made like this to "make shure" Power-Bi have all the data once more pulled from the souse. Quite annoying actually.    

I have a question for this. I am trying to do this myself:
you write // List.Distinct(QueryName[ColumnName]), is this the new query you're working in?

If not, what is it?

Hi @BiBra,

 

Nope, it just means the description of my formula.

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

@v-shex-msft Okay, however the code does not work for me

I get this:

fault2.PNG

Helpful resources

Announcements
Fabcon_Europe_Social_Bogo

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.