Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
I have SQL query that takes a two column table (ReportName, Attribute) and arranges data to show how much any two reports have in common, in terms of shared attributes, as count and percentage.
Here's what power bi report looks like using SQL dataset.
I want to create exactly same using DAX against source table (eliminate SQL query dependency). SQL query and inputs below.
CREATE TABLE #ReportTable ( ReportName varchar (20), AttributeName varchar(30), ); INSERT INTO #ReportTable (ReportName, AttributeName) VALUES ('Compliance Details', 'Vendor Name'), ('Compliance Details', 'Area'), ('Compliance Details', 'Area'), ('Compliance Details', 'Area'), ('Compliance Details', 'Region'), ('Compliance Details', 'SubRegion'), ('Delivery and Invoice', 'Customer Name'), ('Delivery and Invoice', 'Area'), ('Delivery and Invoice', 'Region'), ('Delivery and Invoice', 'SubRegion'), ('Operations Review', 'Customer Name'), ('Operations Review', 'Approver'), ('Operations Review', 'Approval Status'); WITH ReportAttributeCount AS ( SELECT ReportName as ReportA, COUNT(AttributeName) AS ReportA_AttributeCount FROM #ReportTable GROUP BY ReportName ), CommonAttributesCount AS ( SELECT RT.ReportName AS ReportA, RT2.ReportName AS ReportB, COUNT(DISTINCT RT.AttributeName) AS CommonAttributeCount --convert to DAX 1 FROM #ReportTable AS RT INNER JOIN #ReportTable AS RT2 ON RT.ReportName < RT2.ReportName AND RT.AttributeName = RT2.AttributeName GROUP BY RT.ReportName, RT2.ReportName ) SELECT DISTINCT RAC.ReportA, CACA.CommonAttributeCount, CACA.ReportB, --convert to DAX 2 (CACA.CommonAttributeCount * 1.00) / RAC.ReportA_AttributeCount AS reportApercentage,--convert to DAX 3 RAC.ReportA_AttributeCount, (CACA.CommonAttributeCount * 1.00) / RACB.ReportA_AttributeCount AS reportBpercentage,--convert to DAX 4 RACB.ReportA_AttributeCount AS ReportB_AttributeCount --convert to DAX 5 FROM ReportAttributeCount AS RAC INNER JOIN CommonAttributesCount AS CACA ON RAC.ReportA = CACA.ReportA INNER JOIN ReportAttributeCount AS RACB ON CACA.ReportB = RACB.ReportA; DROP TABLE #ReportTable;
I want to begin by converting the CommonAttributeCount to DAX, and this requires a self join. I don't know where to begin. Pls help walk me through this.
Solved! Go to Solution.
Hi there,
Please try adding the following three calcuated tables:
ReportAttributeCount = SELECTCOLUMNS( SUMMARIZECOLUMNS( 'Table2'[ReportName], "ReportA_AttributeCount" , COUNTROWS('Table2') ), "ReportA",[ReportName], "ReportA_AttributeCount",[ReportA_AttributeCount] )
CommonAttributesCount = SUMMARIZE( FILTER( GENERATE( SELECTCOLUMNS( Table2, "ReportName",'Table2'[ReportName], "AttributeName",'Table2'[AttributeName] ) , SELECTCOLUMNS( Table2, "ReportName2",'Table2'[ReportName], "AttributeName2",'Table2'[AttributeName] ) ),[AttributeName]=[AttributeName2] && [ReportName]< [ReportName2] ), [ReportName], [ReportName2], [AttributeName] )
and finally
ReportTable = VAR CommonAttributesCount2 = SELECTCOLUMNS( SUMMARIZE( 'CommonAttributesCount', [ReportName], [ReportName2], "CommonAttributeCount",DISTINCTCOUNT('CommonAttributesCount'[AttributeName])),"CACA.ReportA",[ReportName],"CACA.ReportB",[ReportName2],"CACA.DC",[CommonAttributeCount]) VAR FINAL = ADDCOLUMNS( FILTER( CROSSJOIN( FILTER( CROSSJOIN( SELECTCOLUMNS( ReportAttributeCount, "RAC.ReportA",[ReportA], "RAC.ReportA_AttributeCount", [ReportA_AttributeCount] ), CommonAttributesCount2), [RAC.ReportA]=[CACA.ReportA] ), SELECTCOLUMNS( 'ReportAttributeCount', "RACB.ReportA",[ReportA], "RACB.ReportA_AttributeCount",[ReportA_AttributeCount] ) ),[CACA.ReportB]=[RACB.ReportA]) , "reportApercentage",DIVIDE([CACA.DC],[RAC.ReportA_AttributeCount]), "reportBpercentage",DIVIDE([CACA.DC],[RACB.ReportA_AttributeCount])) RETURN FINAL
which for me returns this as a result
Hi there,
Please try adding the following three calcuated tables:
ReportAttributeCount = SELECTCOLUMNS( SUMMARIZECOLUMNS( 'Table2'[ReportName], "ReportA_AttributeCount" , COUNTROWS('Table2') ), "ReportA",[ReportName], "ReportA_AttributeCount",[ReportA_AttributeCount] )
CommonAttributesCount = SUMMARIZE( FILTER( GENERATE( SELECTCOLUMNS( Table2, "ReportName",'Table2'[ReportName], "AttributeName",'Table2'[AttributeName] ) , SELECTCOLUMNS( Table2, "ReportName2",'Table2'[ReportName], "AttributeName2",'Table2'[AttributeName] ) ),[AttributeName]=[AttributeName2] && [ReportName]< [ReportName2] ), [ReportName], [ReportName2], [AttributeName] )
and finally
ReportTable = VAR CommonAttributesCount2 = SELECTCOLUMNS( SUMMARIZE( 'CommonAttributesCount', [ReportName], [ReportName2], "CommonAttributeCount",DISTINCTCOUNT('CommonAttributesCount'[AttributeName])),"CACA.ReportA",[ReportName],"CACA.ReportB",[ReportName2],"CACA.DC",[CommonAttributeCount]) VAR FINAL = ADDCOLUMNS( FILTER( CROSSJOIN( FILTER( CROSSJOIN( SELECTCOLUMNS( ReportAttributeCount, "RAC.ReportA",[ReportA], "RAC.ReportA_AttributeCount", [ReportA_AttributeCount] ), CommonAttributesCount2), [RAC.ReportA]=[CACA.ReportA] ), SELECTCOLUMNS( 'ReportAttributeCount', "RACB.ReportA",[ReportA], "RACB.ReportA_AttributeCount",[ReportA_AttributeCount] ) ),[CACA.ReportB]=[RACB.ReportA]) , "reportApercentage",DIVIDE([CACA.DC],[RAC.ReportA_AttributeCount]), "reportBpercentage",DIVIDE([CACA.DC],[RACB.ReportA_AttributeCount])) RETURN FINAL
which for me returns this as a result
Thank you, that's great that it can be done.
Working well with demo data:
Woohoo, nice work
If you break apart the code it kinda reads like SQL. Although in DAX, to do something similar to an SQL inner join, you can combine the FILTER(CROSSJOIN(....)...) functions. the rest is just renaming columns so you don't end up with two column names the same.
I've applied DISTINCT to the DAX you gave since there are often duplicate reportname/fieldname pairs in the report table. In the sample dataset this fixes issues of duplicate count, but in real dataset this DISTINCT isn't doing enough to handle for higher than "manual count" of numbers. (SQL Query is handling duplicates well)
You can see I've added DISTINCT
#ReportAttributeCount = SELECTCOLUMNS( SUMMARIZECOLUMNS( 'Report-AttributeMap'[Report Name], "rptA - Total Attributes", COUNTROWS(DISTINCT('Report-AttributeMap')) ), "ReportA", 'Report-AttributeMap'[Report Name], "rptA - Total Attributes", [rptA - Total Attributes] )
to get
notice Compliance Details now has 4 totalreportA attributes instead of 6.
No issues of inflated RACB.ReportA_AttributeCount, only with RAC.ReportA_AttributeCounts.
Is there any other place in the DAX formulas that could be generating duplicates?
Hi @hxkresl
Just a couple of questions that will help me. So I understand the code is ok for the small sample dataset, yes? 🙂
Any chance you can give me a slightly bigger dataset including some duplication that you want to ignore. It would be great if you can do this and also provide the expected result for the bigger dataset to help me pick the best place to apply the de-dup.
Cheers,
Phil
Hi @Phil_Seamark ,
I had heavy code writen in the stored procs to get the data which involves lot of temp table joins and where conditions .(Currently this proc is inserting data into sql table from there SSRS repor is rendering)
If i want to convert this SSRS to powerBi report which one would be faster DAX or get data from stored proc as source
Which once would be technically best suits .
Could you plese suggest.