Solved: any other place in the code I need to use DISTINCT...

hxkresl · ‎03-31-2017

On seperate post https://community.powerbi.com/t5/Desktop/DAX-equivalent-to-complex-SQL-query/m-p/151869#M65682 Phil gave me 3 DAX code segments which I have used to create a 100% DAX solution in place of SQL Query based one. That's wonderful. But, I have one lingering problem of high RAC.ReportA_AttributeCount counts returning from the dataset which in SQL I resolve using DISTINCT.

I have applied DISTINCT here (line 4) and it has helped reduce the inflated counts of attributeNames

#ReportAttributeCount = SELECTCOLUMNS(
                           SUMMARIZECOLUMNS(
                           'Report-AttributeMap'[Report Name],
                           "rptA - Total Attributes", COUNTROWS(DISTINCT('Report-AttributeMap'))
                           ),
                             "ReportA", 'Report-AttributeMap'[Report Name],
                             "rptA - Total Attributes", [rptA - Total Attributes]
)

but still some higher than true counts are returning. the two other code segments are below.

CommonAttributesCount = 
               SUMMARIZE( FILTER(
                GENERATE(
                             SELECTCOLUMNS(
                                    Table2,
                                    "ReportName",'Table2'[ReportName],
                                    "AttributeName",'Table2'[AttributeName]
                                    ) ,                   
                            SELECTCOLUMNS(
                                Table2,
                                "ReportName2",'Table2'[ReportName],
                                "AttributeName2",'Table2'[AttributeName]
                                )
                              ),[AttributeName]=[AttributeName2] && [ReportName]< [ReportName2]
                              ),
                              [ReportName],
                              [ReportName2],
                              [AttributeName]
                              )

ReportTable = 

VAR CommonAttributesCount2 = 
    SELECTCOLUMNS(
        SUMMARIZE(
        'CommonAttributesCount',
        [ReportName],
        [ReportName2],
        "CommonAttributeCount",DISTINCTCOUNT('CommonAttributesCount'[AttributeName])),"CACA.ReportA",[ReportName],"CACA.ReportB",[ReportName2],"CACA.DC",[CommonAttributeCount])

VAR 
    FINAL = ADDCOLUMNS(
        FILTER(
            CROSSJOIN(
                FILTER(
                    CROSSJOIN(
                        SELECTCOLUMNS(
                            ReportAttributeCount,
                            "RAC.ReportA",[ReportA],
                            "RAC.ReportA_AttributeCount",
                            [ReportA_AttributeCount]
                            ),
                       CommonAttributesCount2),
                      [RAC.ReportA]=[CACA.ReportA]
                   ),
                SELECTCOLUMNS(
                    'ReportAttributeCount',
                    "RACB.ReportA",[ReportA],
                    "RACB.ReportA_AttributeCount",[ReportA_AttributeCount]
                    )
                ),[CACA.ReportB]=[RACB.ReportA])
                ,
                "reportApercentage",DIVIDE([CACA.DC],[RAC.ReportA_AttributeCount]),
                "reportBpercentage",DIVIDE([CACA.DC],[RACB.ReportA_AttributeCount]))
    
RETURN FINAL

In addition to verified distinct SQL counts I have verified manually that the counts are higher than truly present in the underlying data (by doing export to excel for verification)

Can someone see anywhere in any of the code segments where DISTINCT can be added to ensure unique ReportName/AttributeName pairs are returned? Or have I applied the DISTINCT function incorrectly?

Phil_Seamark · ‎03-31-2017

I think I have it.

Please update the code for the 'ReportAttributeCount' table (changes highlight)

ReportAttributeCount = SELECTCOLUMNS(
    SUMMARIZE(
            'Report-AttributeMap',
            'Report-AttributeMap'[ReportName],
            "Distinct Count",
            CALCULATE(DISTINCTCOUNT('Report-AttributeMap'[AttributeName])
                )
               ),
    "ReportA",[ReportName],
    "ReportA_AttributeCount",[Distinct Count]
	)

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

Phil_Seamark · ‎03-31-2017

Hi @hxkresl

Just a couple of questions that will help me. So I understand the code is ok for the small sample dataset, yes? 🙂

Any chance you can give me a slightly bigger dataset including some duplication that you want to ignore. It would be great if you can do this and also provide the expected result for the bigger dataset to help me pick the best place to apply the de-dup.

Cheers,

Phil

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

hxkresl · ‎03-31-2017

Yes, no issues with duplicates (when using DISTINCT) on sample dataset.

I generated schema and data script for the actual dataset, saved as file called schemaanddatafor_ReportTable, and placed it here

https://www.dropbox.com/sh/6kgldb1fhfichy4/AAAbXQhUr8VvyD65qTEOvIPFa?dl=0

Can you access?

There are 6996 rows having blank report name/attribute name (which i would also like to handle with some kind of WHERE ReportName <> ''), and 17081 total rows including duplicate reportName/attributeName pairs.

Expected result eg. Area Dashboard Details (ReportA) has 172 total distinct attributes (this is correct):

SQL based result

but with DAX, same report am getting 186 count:

DAX results

Phil_Seamark · ‎03-31-2017

Yep, got that and have imported into a local SQL DB and have subsequently imported into my PBI table.

I named the table [Report-AttributeMap]

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Phil_Seamark · ‎03-31-2017

I imported your data and for that scenario I got 172 (using my original queries). I think we might be getting close 🙂

Here is the PBIX file

https://wgtnpowerbi-my.sharepoint.com/personal/phil_wgtnpowerbi_onmicrosoft_com/_layouts/15/guestacc...

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

hxkresl · ‎03-31-2017

previously I shared with you the 'actual' table minus 3 metadata rows....The source table actually has additional columns that aren't germane to report but that I now realize are skewing outcome of DAX. If you please go back to drop box and try this with the file named schemaanddatafor_ReportTable2, the issue with offcounts will surface. <sorry>

https://www.dropbox.com/s/hbdiwz2cgtxylqq/schemaanddatafor_ReportTable2.sql?dl=0

I was trying to clean it up for ease, but in fact the other columns are causing issue and I don't know how to adapt DAX.

Phil_Seamark · ‎03-31-2017

Cool, that will hopefully help. I just have to do a few hours of outside chores (it's Saturday noon here) and will look at it when I get back.

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

hxkresl · ‎03-31-2017

yes, the measure column adds further distinctness. so Distinct needs to exclude from consideration the Measure and color columns.

Phil_Seamark · ‎03-31-2017

I think I have it.

Please update the code for the 'ReportAttributeCount' table (changes highlight)

ReportAttributeCount = SELECTCOLUMNS(
    SUMMARIZE(
            'Report-AttributeMap',
            'Report-AttributeMap'[ReportName],
            "Distinct Count",
            CALCULATE(DISTINCTCOUNT('Report-AttributeMap'[AttributeName])
                )
               ),
    "ReportA",[ReportName],
    "ReportA_AttributeCount",[Distinct Count]
	)

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

hxkresl · ‎03-31-2017

one last thing. How to exclude any rows where ReportName is blank ?

Phil_Seamark · ‎03-31-2017

You could try using the Query Editor to remove the blanks.

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

hxkresl · ‎03-31-2017

Thank you Phil for taking your time out of your weekend to help me.

I am guessing (since I don't have access to source to refresh) that the person who does, can add a step to remove blanks, which will then replay each time table is refreshed.

Know that this exchange is very powerful way of learning for me and I appreciate it very much. I must now sort through the nested FILTER, CROSSJOIN, FILTER, CROSSJOIN, SELECTCOLUMNS logic. It is very tricky.

Phil_Seamark · ‎03-31-2017

Hi @hxkresl

Do you have access to the Query Editor in Power BI? If so have a look at the technique I suggested to remove the blanks. It should work perfectly 🙂

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

hxkresl · ‎03-31-2017

Yes. Happy to see correct result 172!

and one last thing...

Phil_Seamark · ‎03-31-2017

Yay!

Nothing about DAX is especially complex. All the separate parts are pretty simple. It's just when you put it all together, sometimes it can become tricky.

If you can break it down into mini parts, it can sometimes help alot

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Phil_Seamark · ‎03-31-2017

Downloaded the file and now getting 186 for the RAC.ReportA_AttributeCount

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

any other place in the code I need to use DISTINCT to prevent double counting?

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly

Become a Certified Power BI Data Analyst!

any other place in the code I need to use DISTINCT to prevent double counting?

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly