The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
When using the Transform Data feature, over the View tab, there's an option that we can check: Column distribution, which brings distinct and unique values... What's the difference between these values? By Distinct values can I assume there are duplicates of the same value and for unique there are no duplicates? I already went into the documentation but it still is not clear. Thanks.
https://docs.microsoft.com/en-us/power-query/data-profiling-tools
Hello, does distinct calculates null too?
A good question given that the link that mario_ruiz's description of the "definitive" MicroDoc he linked to falls short.
The page https://docs.microsoft.com/en-us/power-query/data-profiling-tools mentions the terms 'Distinct' and 'Unique' but provides no definition.
MahyarTF and HoangHugo describe it perfectly.
I did a test to see the facts. Findings below the image, some interesting.
Distinct is the count of distinct values as in SQL:
Select count(distinct colname)
from tablename;
Unique is equivalent of:
with uniq(ResellerKey, cnt)
as
(
select ResellerKey, count(*) as cnt
from FactResellerSales
group by ResellerKey
having count(*) = 1
)
select count(*)
from uniq
Enough of the SQL.
This image demonstrates the results. From Excel, the blanks showed null. From CSV with consecutive commas it showed blank. Rather than two images I change the null to blank in the 4th column of the Excel load.
It is clear that null and blank are counted in Distinct. Multiple null/blank did not make it into Unique. Simple image for you to count yourself to full understand.
UPDATED IMAGE: I put a blank in No Duplicates to show that blank/null can be considered Distinct.
Interestingly, I had to close PBI Desktop to update the Empty to 7% in No Duplicates column. Refresh didn't do it. Neither did Apply and Close and then Transform. The profiling must be cached.
Hi,
Distinct mean : count all the values as 1, even if there was more than one.
Unique mean : count only the value that are not repeated in the particular column
Appreciate your Kudos
Hi,
I found it from Online
“Distinct” means total number of different values regardless how many times it appears in the dataset. A name appears in the list multiple times is counted as 1 distinct count.
Whereas, the “Unique” value is total number of values that only appear once.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
79 | |
71 | |
48 | |
41 |
User | Count |
---|---|
136 | |
108 | |
71 | |
64 | |
59 |