March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi everybody !
Here is my issue :
I have this set of data :
SAMU FEATURE Y/N
SAMU 1 A Yes
SAMU 1 B Yes
SAMU 1 C No
SAMU 2 A Yes
SAMU 2 B No
SAMU 2 C No
I would like to calculate the "coefficient of similarity" between SAMU 1 and SAMU 2.
I think it's a measure, but i don't know how to say "Between Samu 1 and samu 2, For each feature, calculate if the features have the same value and calculate the percentage of similarity"
For the exemple, the correct percentage shoud be : 66%
Of course, my real data set have smg like 100 SAMU, and 30+ features. But the Y/N column will always be yes or no.
On my report, i ask to chose two SAMU, and i would like to screen the percentage.
Any idea ?
Thank you so muck for your help and have good day !!!!
Solved! Go to Solution.
Hi @Anonymous
Sorry for the late reply.
Add a measure
Measure =
VAR left =
CONCATENATEX (
VALUES ( 'left slicer' ),
'left slicer'[SAMU],
","
)
VAR right =
CONCATENATEX (
VALUES ( 'right slicer' ),
'right slicer'[SAMU],
","
)
RETURN
IF (
FIND (
MAX ( [SAMU] ),
left,
1,
0
)
|| FIND (
MAX ( [SAMU] ),
right,
1,
0
),
1,
0
)
then modify a measure as below
percent =
VAR P =
CALCULATE (
DISTINCTCOUNT ( 'Table'[FEATURE] ),
FILTER (
'Table',
[similarity1] = 1
&& [Measure] = 1
)
)
/ CALCULATE (
DISTINCTCOUNT ( 'Table'[FEATURE] ),
FILTER (
ALLSELECTED ( 'Table' ),
[Measure] = 1
)
)
RETURN
IF (
ISBLANK ( P ),
0,
P
)
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hello Maggie
This is exactrly what i needed !!!
Thank you so much for your amazing work !
love this community !
Accepted as solution ofc !
Hi @Anonymous
Could you clear me that?
the correct percent 66% is for all features or for each feature?
We look into the column "Y/N" for the similarity based on different "SAUM" category for each feature, right?
To test it on my side, i recongize "yes" as 1, "no" as 0, then calcualte the similarity, is my job going on the right way?
Hey v_juan_li !
Thx for your help !
So the report will be smg like that :
The user chose one SAMU (SAMU is an emergency structure in french) on the left dropdown and an other on the right one.
The goal is to identify the percentage of similarity between the 2 SAMU.
In the set of data i gave here, of course we only have the choice to compare SAMU 1 with SAMU 2.
How the calcul of similarty is :
SAMU 1 FEATURE A = YES and SAMU 2 FEATURE A = YES
then FEATURE A is similar (+1)
SAMU 1 FEATURE B = YES and SAMU 2 FEATURE B = NO
then FEATURE B is not similar (+0)
SAMU 1 FEATURE C = NOand SAMU 2 FEATURE C = NO
then FEATURE B is similar (+1)
Similarity :
Similarity (2) divided by the total of features (3) = 2/3 = 66%
So we look into the column "Y/N" for the similarity based on different "SAMU" category for all feature.
Thank you
Hi @Anonymous
Create two tables
left slicer = VALUES('Table'[SAMU])
right slicer = VALUES('Table'[SAMU])
add columns to slicers.
Create two measures in table
similarity1 =
VAR left =
CONCATENATEX (
VALUES ( 'left slicer' ),
'left slicer'[SAMU],
","
)
VAR right =
CONCATENATEX (
VALUES ( 'right slicer' ),
'right slicer'[SAMU],
","
)
VAR left_yn =
CALCULATE (
MAX ( 'Table'[Y/N] ),
FILTER (
ALLSELECTED ( 'Table' ),
'Table'[FEATURE]
= MAX ( 'Table'[FEATURE] )
&& FIND (
[SAMU],
left,
1,
0
) > 0
)
)
VAR right_yn =
CALCULATE (
MAX ( 'Table'[Y/N] ),
FILTER (
ALLSELECTED ( 'Table' ),
'Table'[FEATURE]
= MAX ( 'Table'[FEATURE] )
&& FIND (
[SAMU],
right,
1,
0
) > 0
)
)
RETURN
IF (
left_yn = right_yn,
1,
0
)
percent =
CALCULATE (
DISTINCTCOUNT ( 'Table'[FEATURE] ),
FILTER (
'Table',
[similarity1] <> 0
)
)
/ CALCULATE (
DISTINCTCOUNT ( 'Table'[FEATURE] ),
ALLSELECTED ( 'Table' )
)
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi Maggi !
Thanks for you amazing job !!!!!!
Itis realy near to be functonial !!
Indeed, it works with a dataset with only 2 SAMU, but when i do it with my real dataset (which contains 100 SAMU), the percentage screened is wrong.
I think it's because he counts in the total of similarity with SAMU which are not chosen in the slicers.
Let's have a look :
I guess he counts in the total of similarity the number for all SAMU, even if they are not chosen in the slicers.
That's why the percentage screened is always > 90%.
I dont have the skills to correct the measure you wrote, this is too much complicated for me ...
I don't know how to attach a document (the .pix) with my post to give you the full dataset ...
EDIT : I think the answer shoud be to modify only the percentage measure to count in the first total only the SAMU chosen in the slicer .. but don't know how to do that !
Hi @Anonymous
Add some data to your original data example(you could paste here), and share me the expected result for the example.
Best Regards
Maggie
Hey Maggie,
i hope you're good today.
So here is a better view of my dataset :
And here is the number i want, dpeending on what SAMU i chose in the left and right slicers.
thanks !
Hi @Anonymous
Sorry for the late reply.
Add a measure
Measure =
VAR left =
CONCATENATEX (
VALUES ( 'left slicer' ),
'left slicer'[SAMU],
","
)
VAR right =
CONCATENATEX (
VALUES ( 'right slicer' ),
'right slicer'[SAMU],
","
)
RETURN
IF (
FIND (
MAX ( [SAMU] ),
left,
1,
0
)
|| FIND (
MAX ( [SAMU] ),
right,
1,
0
),
1,
0
)
then modify a measure as below
percent =
VAR P =
CALCULATE (
DISTINCTCOUNT ( 'Table'[FEATURE] ),
FILTER (
'Table',
[similarity1] = 1
&& [Measure] = 1
)
)
/ CALCULATE (
DISTINCTCOUNT ( 'Table'[FEATURE] ),
FILTER (
ALLSELECTED ( 'Table' ),
[Measure] = 1
)
)
RETURN
IF (
ISBLANK ( P ),
0,
P
)
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi everybody ?
Does someone can help me ? 😞
Thank you
Refer if this can help
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/td-p/196274
Thanks for your answer
This look complicated .. and i dont really understand how to apply it !
To start, i try to find the number of rows similar when filters are applied :
User | Count |
---|---|
117 | |
77 | |
58 | |
52 | |
46 |
User | Count |
---|---|
171 | |
117 | |
63 | |
57 | |
51 |