Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Anonymous
Not applicable

Calcule similarity between two item

Hi everybody !

 

Here is my issue :

I have this set of data :

 

SAMU             FEATURE        Y/N

SAMU 1          A                     Yes

SAMU 1          B                     Yes

SAMU 1          C                     No

SAMU 2          A                     Yes

SAMU 2          B                     No

SAMU 2          C                     No

 

I would like to calculate the "coefficient of similarity" between SAMU 1 and SAMU 2.

I think it's a measure, but i don't know how to say "Between Samu 1 and samu 2, For each feature, calculate if the features have the same value and calculate the percentage of similarity"

For the exemple, the correct percentage shoud be : 66%

 

Of course, my real data set have smg like 100 SAMU, and 30+ features. But the Y/N column will always be yes or no.

 

On my report, i ask to chose two SAMU, and i would like to screen the percentage.

 

Any idea ?

 

Thank you so muck for your help and have good day !!!!

1 ACCEPTED SOLUTION

Hi @Anonymous 

Sorry for the late reply.

Add a measure

 

Measure =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
RETURN
    IF (
        FIND (
            MAX ( [SAMU] ),
            left,
            1,
            0
        )
            || FIND (
                MAX ( [SAMU] ),
                right,
                1,
                0
            ),
        1,
        0
    )

 

then modify a measure as below

 

percent =
VAR P =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        FILTER (
            'Table',
            [similarity1] = 1
                && [Measure] = 1
        )
    )
        / CALCULATE (
            DISTINCTCOUNT ( 'Table'[FEATURE] ),
            FILTER (
                ALLSELECTED ( 'Table' ),
                [Measure] = 1
            )
        )
RETURN
    IF (
        ISBLANK ( P ),
        0,
        P
    )

 

15.JPG

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

11 REPLIES 11
Anonymous
Not applicable

Hello Maggie 

 

This is exactrly what i needed !!!

Thank you so much for your amazing work !

 

love this community !

 

Accepted as solution ofc !

v-juanli-msft
Community Support
Community Support

Hi @Anonymous 

Could you clear me that?

the correct percent 66% is for all features or for each feature?

We look into the column "Y/N" for the similarity based on different "SAUM" category for each feature, right?

To test it on my side, i recongize "yes" as 1, "no" as 0, then calcualte the similarity, is my job going on the right way?

 
Best Regards
Maggie

 

Anonymous
Not applicable

Hey v_juan_li !

 

Thx for your help !

 

So the report will be smg like that :

SAMU.png

The user chose one SAMU (SAMU is an emergency structure in french) on the left dropdown and an other on the right one.

The goal is to identify the percentage of similarity between the 2 SAMU.

In the set of data i gave here, of course we only have the choice to compare SAMU 1 with SAMU 2.

 

How the calcul of similarty is :

SAMU 1 FEATURE A = YES and SAMU 2 FEATURE A = YES

then FEATURE A is similar (+1)

 

SAMU 1 FEATURE B = YES and SAMU 2 FEATURE B = NO

then FEATURE B is not similar (+0)

 

SAMU 1 FEATURE C = NOand SAMU 2 FEATURE C = NO

then FEATURE B is similar (+1)

 

Similarity : 

Similarity (2) divided by the total of features (3) = 2/3 = 66%

 

So we look into the column "Y/N" for the similarity based on different "SAMU" category for all feature.

 

Thank you

 

 

Hi @Anonymous 

Create two tables

left slicer = VALUES('Table'[SAMU])

right slicer = VALUES('Table'[SAMU])

add columns to slicers.

 

Create two measures in table

Capture15.JPG

 

similarity1 =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
VAR left_yn =
    CALCULATE (
        MAX ( 'Table'[Y/N] ),
        FILTER (
            ALLSELECTED ( 'Table' ),
            'Table'[FEATURE]
                = MAX ( 'Table'[FEATURE] )
                && FIND (
                    [SAMU],
                    left,
                    1,
                    0
                ) > 0
        )
    )
VAR right_yn =
    CALCULATE (
        MAX ( 'Table'[Y/N] ),
        FILTER (
            ALLSELECTED ( 'Table' ),
            'Table'[FEATURE]
                = MAX ( 'Table'[FEATURE] )
                && FIND (
                    [SAMU],
                    right,
                    1,
                    0
                ) > 0
        )
    )
RETURN
    IF (
        left_yn = right_yn,
        1,
        0
    )
percent =
CALCULATE (
    DISTINCTCOUNT ( 'Table'[FEATURE] ),
    FILTER (
        'Table',
        [similarity1] <> 0
    )
)
    / CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        ALLSELECTED ( 'Table' )
    )

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi Maggi !

 

Thanks for you amazing job !!!!!!

Itis realy near to be functonial !!

 

Indeed, it works with a dataset with only 2 SAMU, but when i do it with my real dataset (which contains 100 SAMU), the percentage screened is wrong.

I think it's because he counts in the total of similarity with SAMU which are not chosen in the slicers.

Let's have a look :

PBI data.png

I guess he counts in the total of similarity the number for all SAMU, even if they are not chosen in the slicers.

That's why the percentage screened is always > 90%.

 

I dont have the skills to correct the measure you wrote, this is too much complicated for me ...

I don't know how to attach a document (the .pix) with my post to give you the full dataset ...

 

EDIT : I think the answer shoud be to modify only the percentage measure to count in the first total only the SAMU chosen in the slicer .. but don't know how to do that !

Hi @Anonymous 

Add some data to your original data example(you could paste here), and share me the expected result for the example.

 

Best Regards

Maggie

Anonymous
Not applicable

Hey Maggie,

 

i hope you're good today.

 

So here is a better view of my dataset :

PBI forum.png

 

And here is the number i want, dpeending on what SAMU i chose in the left and right slicers.

 

PBI forum 2.png

 

thanks !

Hi @Anonymous 

Sorry for the late reply.

Add a measure

 

Measure =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
RETURN
    IF (
        FIND (
            MAX ( [SAMU] ),
            left,
            1,
            0
        )
            || FIND (
                MAX ( [SAMU] ),
                right,
                1,
                0
            ),
        1,
        0
    )

 

then modify a measure as below

 

percent =
VAR P =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        FILTER (
            'Table',
            [similarity1] = 1
                && [Measure] = 1
        )
    )
        / CALCULATE (
            DISTINCTCOUNT ( 'Table'[FEATURE] ),
            FILTER (
                ALLSELECTED ( 'Table' ),
                [Measure] = 1
            )
        )
RETURN
    IF (
        ISBLANK ( P ),
        0,
        P
    )

 

15.JPG

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi everybody ?

 

Does someone can help me ? 😞

 

Thank you

amitchandak
Super User
Super User

Refer if this can help

https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/td-p/196274

Join us as experts from around the world come together to shape the future of data and AI!
At the Microsoft Analytics Community Conference, global leaders and influential voices are stepping up to share their knowledge and help you master the latest in Microsoft Fabric, Copilot, and Purview.
️ November 12th-14th, 2024
 Online Event
Register Here
Anonymous
Not applicable

Thanks for your answer

 

This look complicated .. and i dont really understand how to apply it !

 

To start, i try to find the number of rows similar when filters are applied :

XXX =
CALCULATE(
COUNTROWS(TEST);
FILTER(TEST_2;
TEST[SAMU]=TEST_2[SAMU]
&& TEST[ITEM] = TEST_2[ITEM]
&& TEST[CRITERE] = TEST_2[CRITERE]
)
)
 
but it does not really work ..
 
Any idea, ?
 

Helpful resources

Announcements
ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.