Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
hxkresl
Helper V
Helper V

any other place in the code I need to use DISTINCT to prevent double counting?

On seperate post https://community.powerbi.com/t5/Desktop/DAX-equivalent-to-complex-SQL-query/m-p/151869#M65682  Phil gave me 3 DAX code segments which I have used to create a 100% DAX solution in place of SQL Query based one.  That's wonderful. But, I have one lingering problem of high RAC.ReportA_AttributeCount counts returning from the dataset which in SQL I resolve using DISTINCT. 

 

I have applied DISTINCT here (line 4) and it has helped reduce the inflated counts of attributeNames

#ReportAttributeCount = SELECTCOLUMNS(
                           SUMMARIZECOLUMNS(
                           'Report-AttributeMap'[Report Name],
                           "rptA - Total Attributes", COUNTROWS(DISTINCT('Report-AttributeMap'))
                           ),
                             "ReportA", 'Report-AttributeMap'[Report Name],
                             "rptA - Total Attributes", [rptA - Total Attributes]
)

 

but still some higher than true counts are returning.  the two other code segments are below.

CommonAttributesCount = 
               SUMMARIZE( FILTER(
                GENERATE(
                             SELECTCOLUMNS(
                                    Table2,
                                    "ReportName",'Table2'[ReportName],
                                    "AttributeName",'Table2'[AttributeName]
                                    ) ,                   
                            SELECTCOLUMNS(
                                Table2,
                                "ReportName2",'Table2'[ReportName],
                                "AttributeName2",'Table2'[AttributeName]
                                )
                              ),[AttributeName]=[AttributeName2] && [ReportName]< [ReportName2]
                              ),
                              [ReportName],
                              [ReportName2],
                              [AttributeName]
                              )
ReportTable = 

VAR CommonAttributesCount2 = 
    SELECTCOLUMNS(
        SUMMARIZE(
        'CommonAttributesCount',
        [ReportName],
        [ReportName2],
        "CommonAttributeCount",DISTINCTCOUNT('CommonAttributesCount'[AttributeName])),"CACA.ReportA",[ReportName],"CACA.ReportB",[ReportName2],"CACA.DC",[CommonAttributeCount])

VAR 
    FINAL = ADDCOLUMNS(
        FILTER(
            CROSSJOIN(
                FILTER(
                    CROSSJOIN(
                        SELECTCOLUMNS(
                            ReportAttributeCount,
                            "RAC.ReportA",[ReportA],
                            "RAC.ReportA_AttributeCount",
                            [ReportA_AttributeCount]
                            ),
                       CommonAttributesCount2),
                      [RAC.ReportA]=[CACA.ReportA]
                   ),
                SELECTCOLUMNS(
                    'ReportAttributeCount',
                    "RACB.ReportA",[ReportA],
                    "RACB.ReportA_AttributeCount",[ReportA_AttributeCount]
                    )
                ),[CACA.ReportB]=[RACB.ReportA])
                ,
                "reportApercentage",DIVIDE([CACA.DC],[RAC.ReportA_AttributeCount]),
                "reportBpercentage",DIVIDE([CACA.DC],[RACB.ReportA_AttributeCount]))
    
RETURN FINAL 

 

In addition to verified distinct SQL counts I have verified manually that the counts are higher than truly present in the underlying data (by doing export to excel for verification)

 

Can someone see anywhere in any of the code segments where DISTINCT can be added to ensure unique ReportName/AttributeName pairs are returned?  Or have I applied the DISTINCT function incorrectly?

1 ACCEPTED SOLUTION

I think I have it.

 

Please update the code for the 'ReportAttributeCount' table (changes highlight)

 

ReportAttributeCount = SELECTCOLUMNS(
    SUMMARIZE(
            'Report-AttributeMap',
            'Report-AttributeMap'[ReportName],
            "Distinct Count",
            CALCULATE(DISTINCTCOUNT('Report-AttributeMap'[AttributeName])
                )
               ),
    "ReportA",[ReportName],
    "ReportA_AttributeCount",[Distinct Count]
	)

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

15 REPLIES 15
Phil_Seamark
Employee
Employee

Hi @hxkresl

 

Just a couple of questions that will help me.  So I understand the code is ok for the small sample dataset, yes?  🙂

 

Any chance you can give me a slightly bigger dataset including some duplication that you want to ignore.  It would be great if you can do this and also provide the expected result for the bigger dataset to help me pick the best place to apply the de-dup.

 

Cheers,

 

Phil


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Yes, no issues with duplicates (when using DISTINCT) on sample dataset. 

 

I generated schema and data script for the actual dataset, saved as file called schemaanddatafor_ReportTable, and placed it here

https://www.dropbox.com/sh/6kgldb1fhfichy4/AAAbXQhUr8VvyD65qTEOvIPFa?dl=0

 

Can you access?

 

There are 6996 rows having blank report name/attribute name (which i would also like to handle with some kind of WHERE ReportName <> ''), and 17081 total rows including duplicate reportName/attributeName pairs.

 

Expected result eg. Area Dashboard Details (ReportA) has 172 total distinct attributes (this is correct):

SQL based resultSQL based result

 

but with DAX, same report am getting 186 count:

DAX resultsDAX results

Yep, got that and have imported into a local SQL DB and have subsequently imported into my PBI table.

 

I named the table [Report-AttributeMap]


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

I imported your data and for that scenario I got 172 (using my original queries).  I think we might be getting close 🙂

 

Here is the PBIX file

 

https://wgtnpowerbi-my.sharepoint.com/personal/phil_wgtnpowerbi_onmicrosoft_com/_layouts/15/guestacc...

 

caca.png


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

previously I shared with you the 'actual' table minus 3 metadata rows....The source table actually has additional columns that aren't germane to report but that I now realize are skewing outcome of DAX.   If you please go back to drop box  and try this with the file named schemaanddatafor_ReportTable2, the issue with offcounts will surface. <sorry>

 

https://www.dropbox.com/s/hbdiwz2cgtxylqq/schemaanddatafor_ReportTable2.sql?dl=0

 

I was trying to clean it up for ease, but in fact the other columns are causing issue and I don't know how to adapt DAX.

Cool, that will hopefully help.  I just have to do a few hours of outside chores (it's Saturday noon here) and will look at it when I get back.


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

yes, the measure column adds further distinctness.  so Distinct needs to exclude from consideration the Measure and color columns.

I think I have it.

 

Please update the code for the 'ReportAttributeCount' table (changes highlight)

 

ReportAttributeCount = SELECTCOLUMNS(
    SUMMARIZE(
            'Report-AttributeMap',
            'Report-AttributeMap'[ReportName],
            "Distinct Count",
            CALCULATE(DISTINCTCOUNT('Report-AttributeMap'[AttributeName])
                )
               ),
    "ReportA",[ReportName],
    "ReportA_AttributeCount",[Distinct Count]
	)

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

one last thing. How to exclude any rows where ReportName is blank  ? 

Capture9.PNG

 

You could try using the Query Editor to remove the blanks.

 

remove blanks.png


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Thank you Phil for taking your time out of your weekend to help me.

 

I am guessing (since I don't have access to source to refresh) that the person who does, can add a step to remove blanks, which will then replay each time table is refreshed.

 

Know that this exchange is very powerful way of learning for me and I appreciate it very much.  I must now sort through the nested FILTER, CROSSJOIN, FILTER, CROSSJOIN, SELECTCOLUMNS logic.  It is very tricky.

Hi @hxkresl

 

Do you have access to the Query Editor in Power BI?  If so have a look at the technique I suggested to remove the blanks.  It should work perfectly 🙂


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Yes.  Happy to see correct result 172!

 

and one last thing...

 

Yay!

 

Nothing about DAX is especially complex.  All the separate parts are pretty simple.  It's just when you put it all together, sometimes it can become tricky.

 

If you can break it down into mini parts, it can sometimes help alot


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Downloaded the file and now getting 186 for the RAC.ReportA_AttributeCount


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.