Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
hoongary
Frequent Visitor

Need help with SUMMARIZE and SUMX in a multi FACT data model

Hi Everyone,

I need help in the following scenario:

I have a muti FACT data model with 2 FACT tables and 2 DIM tables.Each DIM has a one-to-many relationship to each FACT: 

 

diagram.png

 

I want to create a table visualisation which has data on the rows from both DIM tables. For the values, I need to create a logic which distinct counts the [activation_id] column from both FACT tables, and if the combined sum of these distinct counts are > 0, assign value 1, otherwise assign 0. Then I want to sum up these values starting from the lowest level of hierarchy on rows, and as the final output of this measure I need to show the summed up values for every row, on sub totals and on the grand total too.

 

Here is the measure what I'm using:
(It might be far from perfect)

 

 

Measure =
VAR  _SelectColumnsPlatformA = SELECTCOLUMNS(
    'FACT Platform_A',
    "Product_ID", 'FACT Platform_A'[Product_ID],
    "Customer_ID", 'FACT Platform_A'[Customer_ID],
    "Total_Activations", CALCULATE( DISTINCTCOUNT( 'FACT Platform_A'[activation_id] ) )
)

VAR  _SelectColumnsPlatformB = SELECTCOLUMNS(
    'FACT Platform_B',
    "Product_ID", 'FACT Platform_B'[Product_ID],
    "Customer_ID", 'FACT Platform_B'[Customer_ID],
    "Total_Activations", CALCULATE( DISTINCTCOUNT( 'FACT Platform_B'[activation_id] ) )
)

VAR _UnionFacts = UNION(_SelectColumnsPlatformA, _SelectColumnsPlatformB)
VAR  _SummarizedFacts = SUMMARIZE(
    _UnionFacts,
    [Product_ID],
    [Customer_ID],
    [Total_Activations]
)
VAR _AddTransformedActivations = ADDCOLUMNS(
    _SummarizedFacts,
    "Transformed_Activations", IF([Total_Activations] > 0, 1, 0)
)
VAR  _Results = SUMX(
    _AddTransformedActivations,
    [Transformed_Activations]   
)

RETURN 
 _Results

 


 The issue is that at the lowest level (which is from DIM Customer) the summed values are duplicating. So instead of just showing values 1, it is also showing values > 1.

 

Untitled design.png

 

I assume it's because the combination of CustomerID and ProductID are not unique at _SummarizedFacts variable, and then because of context transition the "Transformed_Activations" sums up 1 every time it finds the corresponding CustomerID, and hence the duplicated results.

 

Can someone help me to achieve my goal?
(which is to assign value 1 for every CustomerID, then sum it up to show correct numbers on sub totals and on grand totals. Unioning the 2 FACT's in a physical table is not an option here)

 

Thanks for reading this!

1 ACCEPTED SOLUTION

OK. It sounds like the Subject Group / Customer granularity is what you're after, so that's the level you want to summarize over:

Total Adoptions =
VAR _PlatformA =
    SUMMARIZE (
        'FACT Platform_A',
        "Subject_Group", 'DIM Product'[Subject_Group],
        "Customer_ID", 'FACT Platform_A'[Customer_ID],
        "Total_Activations", DISTINCTCOUNT ( 'FACT Platform_A'[activation_id] )
    )
VAR _PlatformB =
    SUMMARIZE (
        'FACT Platform_B',
        "Subject_Group", 'DIM Product'[Subject_Group],
        "Customer_ID", 'FACT Platform_B'[Customer_ID],
        "Total_Activations", DISTINCTCOUNT ( 'FACT Platform_B'[activation_id] )
    )
VAR _SummarizedFacts =
    GROUPBY (
        UNION ( _PlatformA, _PlatformB ),
        [Subject_Group],
        [Customer_ID],
        "Total_Activations", SUMX ( CURRENTGROUP (), [Total_Activations] )
    )
VAR _Results =
    COUNTROWS ( FILTER ( _SummarizedFacts, [Total_Activations] > 0 ) )
RETURN
    _Results

 

View solution in original post

15 REPLIES 15
hoongary
Frequent Visitor

Hi @AlexisOlson ,

I'm trying to solve a new business requirement which is related to your solution.
I'm not sure if what I'm doing is right, so I'm curious to know what you think.

The measure that was created with your help calculates the Total Adoptions. 
My new requirement is to create 3 additional columns:
- Won Adoptions: CY Activations > 0 and (PY Activations = 0 or PY Activations is blank) 
- Lost Adoptions: (CY Activations = 0 or CY Activations is blank) and PY Activations > 0
- Retained Adoptions: CY Activations > 0 and PY Activations > 0

The report uses an external filter which is filtered for Year 2023.
So in the chart where the below measure will be used, because of the applied filter on Year, the "Total_Activations" will calculate activations for 2023, and "PY Activations" will calculate activations for 2022 (but please correct me if I'm wrong) .

The additions that I have made on the previous version is that I have added PY Activations into the formula in _PlatformA, _PlatformB and in _SummarizedFacts.
Then I have added the _AddSegments part, where I'm creating the segments based on the above described logic.
As the last step, in _ResultsWon I'm filtering the _AddSegments to only include rows where [Won] = 1, and by counting the rows I assume I get the correct results for the Won segment.
Then I'd duplicate this measure to get Lost Adoptions and Retained Adoptions by changing the variable in the RETURN clause with _ResultsLost and _ResultsRetained.

Here is the code for Won Adoptions:

 

Won Adoptions =
VAR _PlatformA =
    SUMMARIZE (
        'FACT Platform_A',
        "Subject_Group", 'DIM Product'[Subject_Group],
        "Customer_ID", 'FACT Platform_A'[Customer_ID],
        "Total_Activations", CALCULATE(DISTINCTCOUNT ( 'FACT Platform_A'[activation_id] ) ),
		"PY Activations",
            CALCULATE(
                DISTINCTCOUNT( 'FACT Platform_A'[activation_id] ),
                CALCULATETABLE( SAMEPERIODLASTYEAR( 'DIM Date Table'[Date] ), 'DIM Date Table'[YTD Flag] = TRUE( ) )
            )
    )
VAR _PlatformB =
    SUMMARIZE (
        'FACT Platform_B',
        "Subject_Group", 'DIM Product'[Subject_Group],
        "Customer_ID", 'FACT Platform_B'[Customer_ID],
        "Total_Activations", CALCULATE( DISTINCTCOUNT ( 'FACT Platform_B'[activation_id] ) ),
		"PY Activations",
            CALCULATE(
                DISTINCTCOUNT( 'FACT Platform_B'[activation_id] ),
                CALCULATETABLE( SAMEPERIODLASTYEAR( 'DIM Date Table'[Date] ), 'DIM Date Table'[YTD Flag] = TRUE( ) )
    )
VAR _SummarizedFacts =
    GROUPBY (
        UNION ( _PlatformA, _PlatformB ),
        [Subject_Group],
        [Customer_ID],,
        "Total_Activations_Sum", SUMX( CURRENTGROUP( ), [Total_Activations] ),
        "PY_Activations_Sum", SUMX( CURRENTGROUP( ), [PY Activations] )
    )
VAR _AddSegments =
    ADDCOLUMNS(
        _SummarizedFacts,
        "Won", IF( [Total_Activations_Sum] > 0 && COALESCE( [PY_Activations_Sum], 0 ) = 0, 1 ),
		"Lost", IF( COALESCE( [Total_Activations_Sum], 0 ) = 0 && [PY_Activations_Sum] > 0, 1 ),
		"Retained", IF( [Total_Activations_Sum] > 0 && [PY_Activations_Sum] > 0, 1 )
    )	
VAR _ResultsWon =
    COUNTROWS ( FILTER ( _AddSegments, [Won] = 1) )
VAR _ResultsLost =
    COUNTROWS ( FILTER ( _AddSegments, [Lost] = 1) )
VAR _ResultsRetained =
    COUNTROWS ( FILTER ( _AddSegments, [Retained] = 1) )		
RETURN
    _ResultsWon

 

 
These new columns would be used in the exact same chart like the previous version, so the granularity is the same.
I assume that the totals of the [Total Adoptions] measure should be equal to [Won Adoptions] + [Lost Adoptions] + [Retained Adoptions].

Now after all, what do you think about my solution that I have provided above? Do you think it is accurate and can be trusted?
Do you have a better idea maybe?
(For example it is not going to work for the Lost segment because Summarize will only give me results back where [Total_Activations] are not blank and its greater than 1)


The main issue I see here is along the lines you mentioned. You won't have the full set of customers you need since the SUMMARIZE will only include customers in the current filter context.

 

You can probably fix this by removing date context when assembling the tables of groups and customers.

Won Adoptions =
VAR _A_Customers_Groups =
    CALCULATETABLE (
        SUMMARIZE (
            'FACT Platform_A',
            "Subject_Group", 'DIM Product'[Subject_Group],
            "Customer_ID", 'FACT Platform_A'[Customer_ID]
        ),
        ALL ( 'DIM Date Table' )
    )
VAR _B_Customers_Groups =
    CALCULATETABLE (
        SUMMARIZE (
            'FACT Platform_B',
            "Subject_Group", 'DIM Product'[Subject_Group],
            "Customer_ID", 'FACT Platform_B'[Customer_ID]
        ),
        ALL ( 'DIM Date Table' )
    )
VAR _PlatformA =
    ADDCOLUMNS (
        _A_Customers_Groups,
        "Total_Activations",
            CALCULATE (
                DISTINCTCOUNT ( 'FACT Platform_A'[activation_id] )
            ),
        "PY Activations",
            CALCULATE (
                DISTINCTCOUNT ( 'FACT Platform_A'[activation_id] ),
                CALCULATETABLE (
                    SAMEPERIODLASTYEAR ( 'DIM Date Table'[Date] ),
                    'DIM Date Table'[YTD Flag] = TRUE ()
                )
            )
    )
VAR _PlatformB =
    ADDCOLUMNS (
        _B_Customers_Groups,
        "Total_Activations",
            CALCULATE (
                DISTINCTCOUNT ( 'FACT Platform_B'[activation_id] )
            ),
        "PY Activations",
            CALCULATE (
                DISTINCTCOUNT ( 'FACT Platform_B'[activation_id] ),
                CALCULATETABLE (
                    SAMEPERIODLASTYEAR ( 'DIM Date Table'[Date] ),
                    'DIM Date Table'[YTD Flag] = TRUE ()
                )
            )
    )
[...]

Hi @AlexisOlson ,
your solution works perfectly for all the 3 segments!
Thank you very much, again!

So can I say that the lesson learned here is that I need to use CALCULATETABLE to remove the filters from the DIM Date Table whenever I face a similar task where I need to compare CY and PY values inside a table manipulation function (like SUMMARIZE/ADDCOLUMNS/GROUPBY) ?

The general lesson is that if you need to consider dates outside of the currently filtered set of dates in your logic, then you will need to adjust your date filter context to include those dates. I did it in a very broad way here by removing all 'Dim Date Table' filtering but you could be more targeted if needed.

Got it, thank you!

hoongary
Frequent Visitor

Hi @AlexisOlson 

There are 3 dimensions on the rows:
1st (top level) : Subject Master Group (from DIM Product)
2nd: Subject Group (from DIM Product)
3rd (bottom level): Customer ID (from DIM Customer)

The 1st and 2nd levels are higher in hierarchy than Product ID (Subject Group contains several Product ID's, Subject Master Group contains several Subject Groups. The same Customer ID can exists under different Subject Groups and therefore under different Subject Master Groups.

However I took a different approach, and I was able to get the correct numbers at the bottom level (so at Customer ID from DIM Customer) and at the 2nd level (Subject Group from DIM Product).
But I get incorrect results at the top level (Subject Master Group from DIM Product) and at the grand total level, and I don't understand that if it is working correctly at the Subject Group level why it is incorrect at the Subject Master Group level and hence at the grand total level. 

2nd_approach_blur.png

Numbers in the green frame correct at the 3rd level (7x1), and sums up correctly at the 2nd level (7x1=7) , and it works for every row. at this level
Then the orange frame (top level) is incorrect ( 53 + 7 + 2 != 61 ), and the grand total is totally incorrect.

This is the full measure:

 

VAR _Calculations =
    SUMX(
        VALUES(
            'DIM Customer'[Customer_ID]
        ),
        IF( [Total Activations] > 0, 1)
    )

RETURN
    _Calculations

 



When DISTINCTCOUNT is involved, it's common for totals not to be the sum of the parts since the parts can overlap. and forcing the totals to be the sum of the parts means you're counting some customers multiple times. Whether this is "correct" or not depends on your specific use case and I don't know yours specifically.

 

In order to get the proper result for your use case, you need to decide what counts as a single adoption. If the same customer adopts two separate products, should that count as one or two? If those two products are in the same group, do you want the count at that group level to be one or two? If two customers adopt the same two products from a single group, should the count at the group level be 2 or 4?

The measure I provided assumed that each unique product/customer combination counts separately and the measure you defined just above assumes that only the unique customer counts. Since neither of these is what you want, what exactly is the granularity that you're interested in?

If the same customer activated two products which fall under the same Subject Group, should be counted only once, so Subject Group A sub total = 1.
If the same customer activated two products which fall under different Subject Group, should be counted under each Subject Group. So Subject Group A sub total = 1 and Subject Group B sub total = 1.


If Subject Group A and Subject Group B falls under the same Subject Master Group, then Subject Group A + Subject Group B should = 2 at the Subject Master Group sub total.

If Subject Group A and Subject Group B falls under different Subject Master Group, then Subject Master Group A should be = 1 and Subject Master Group B should be = 1 at the sub total level.


The grand total should be the sum of every Subject Master Group's sub total, based on the logic defined above.

 

Answering your questions:
If the same customer adopts two separate products, should that count as one or two?
If those two products are in the same group, do you want the count at that group level to be one or two?
- If customer is the same, and if these products are under different Subject Group, should be counted once for Subject Group A, and once for Subject Group B.
- If customer is the same, and if these products are under the same Subject Group, should be counted once.


If two customers adopt the same two products from a single group, should the count at the group level be 2 or 4?
- The count should be 2 at the group level.



I haven't showed you the [Total Activations] measure which was used in the [Total Adoptions] measure, so I'll put both of the measures here:

		
Total Activations = 
VAR _FACT_A =
            CALCULATE(
                DISTINCTCOUNT('FACT FACT_A'[activation_id])
            )

VAR _FACT_B =
            CALCULATE(
                DISTINCTCOUNT('FACT FACT_B'[activation_id])
            )


VAR _Results = 
            _FACT_A + _FACT_B

RETURN 
_Results



Total Adoptions = 
VAR _Calculations =
    SUMX(
        VALUES(
            'DIM Customer'[Customer_ID]
        ),
        IF( [Total Activations] > 0, 1)
    )

RETURN
    _Calculations
	

 
I hope this gives you a bit more context.

OK. It sounds like the Subject Group / Customer granularity is what you're after, so that's the level you want to summarize over:

Total Adoptions =
VAR _PlatformA =
    SUMMARIZE (
        'FACT Platform_A',
        "Subject_Group", 'DIM Product'[Subject_Group],
        "Customer_ID", 'FACT Platform_A'[Customer_ID],
        "Total_Activations", DISTINCTCOUNT ( 'FACT Platform_A'[activation_id] )
    )
VAR _PlatformB =
    SUMMARIZE (
        'FACT Platform_B',
        "Subject_Group", 'DIM Product'[Subject_Group],
        "Customer_ID", 'FACT Platform_B'[Customer_ID],
        "Total_Activations", DISTINCTCOUNT ( 'FACT Platform_B'[activation_id] )
    )
VAR _SummarizedFacts =
    GROUPBY (
        UNION ( _PlatformA, _PlatformB ),
        [Subject_Group],
        [Customer_ID],
        "Total_Activations", SUMX ( CURRENTGROUP (), [Total_Activations] )
    )
VAR _Results =
    COUNTROWS ( FILTER ( _SummarizedFacts, [Total_Activations] > 0 ) )
RETURN
    _Results

 

Now this solution gives me exactly what I was looking for! 👍 🎉


The only thing I have changed is transformed your SUMMARIZE part in _Platforms to ADDCOLUMNS/SUMMARIZE:

 

 

VAR _PlatformA =
    ADDCOLUMNS(
        SUMMARIZE(
            'FACT Platform_A',
            'DIM Product'[subject_group],
            'FACT Platform_A'[Customer_ID]
        ),
        "Total_Activations", DISTINCTCOUNT( 'FACT Platform_A'[activation_id] )
    )
VAR _PlatformB =
    ADDCOLUMNS(
        SUMMARIZE(
            'FACT _Platform_B ',
            'DIM Product'[subject_group],
            'FACT _Platform_B '[Customer_ID]
        ),
        "Total_Activations", DISTINCTCOUNT( 'FACT _Platform_B '[activation_id] )
    )

 

 

 I don't think that I'd have ever figured out that in the SUMMARIZE parts I need to use Subject Group from the DIM and Customer ID from the FACT.


Thank you very much for your help and your time! 👏👏

Great!

 

Note that if you use ADDCOLUMNS, you probably also want to use CALCULATE around DISTINCTCOUNT.
https://www.sqlbi.com/articles/best-practices-using-summarize-and-addcolumns/

You are right, thanks!

AlexisOlson
Super User
Super User

Unioning the physical tables seems like the easiest option if possible. Since you've specified it's not, try updating your _SummarizedFacts variable to this:

VAR _SummarizedFacts =
    GROUPBY (
        _UnionFacts,
        [Product_ID],
        [Customer_ID],
        "Total_Activations", SUMX ( CURRENTGROUP (), [Total_Activations] )
    )

 

Full measure:

a =
VAR _SelectColumnsPlatformA =
    SELECTCOLUMNS (
        'FACT Platform_A',
        "Product_ID", 'FACT Platform_A'[Product_ID],
        "Customer_ID", 'FACT Platform_A'[Customer_ID],
        "Total_Activations", CALCULATE ( DISTINCTCOUNT ( 'FACT Platform_A'[activation_id] ) )
    )
VAR _SelectColumnsPlatformB =
    SELECTCOLUMNS (
        'FACT Platform_B',
        "Product_ID", 'FACT Platform_B'[Product_ID],
        "Customer_ID", 'FACT Platform_B'[Customer_ID],
        "Total_Activations", CALCULATE ( DISTINCTCOUNT ( 'FACT Platform_B'[activation_id] ) )
    )
VAR _UnionFacts =
    UNION ( _SelectColumnsPlatformA, _SelectColumnsPlatformB )
VAR _SummarizedFacts =
    GROUPBY (
        _UnionFacts,
        [Product_ID],
        [Customer_ID],
        "Total_Activations", SUMX ( CURRENTGROUP (), [Total_Activations] )
    )
VAR _Results =
    COUNTROWS ( FILTER ( _UnionFacts, [Total_Activations] > 0 ) )
RETURN
    _Results

Hi @AlexisOlson 
Thanks for your solution.

I think instead of _UnionFacts you mean _SummarizedFacts as the table for the FILTER():

VAR _Results =
    COUNTROWS ( FILTER ( _SummarizedFact, [Total_Activations] > 0 ) )

 
I've tried your solution, but unfortunatelly I get the exact same results:

 

round_2.png

Full measure:

VAR _SelectColumnsPlatformA =
    SELECTCOLUMNS(
        'FACT Platform_A',
        "Product_ID", 'FACT Platform_A'[Product_ID],
        "Customer_ID", 'FACT Platform_A'[Customer_ID],
        "Total_Activations",
            CALCULATE(
                DISTINCTCOUNT( 'FACT Platform_A'[activation_id] )
            )
    )
VAR _SelectColumnsPlatformB =
    SELECTCOLUMNS(
        'FACT Platform_B',
        "Product_ID", 'FACT Platform_B'[Product_ID],
        "Customer_ID", 'FACT Platform_B'[Customer_ID],
        "Total_Activations",
            CALCULATE(
                DISTINCTCOUNT( 'FACT Platform_B'[activation_id] )
            )
    )
VAR _UnionFacts =
    UNION( _SelectColumnsPlatformA, _SelectColumnsPlatformB )
VAR _SummarizedFacts =
    GROUPBY(
        _UnionFacts,
        [Product_ID],
        [Customer_ID],
        "Total_Activations",
            SUMX( CURRENTGROUP( ), [Total_Activations] )
    )
VAR _Results =
    COUNTROWS(
        FILTER( _SummarizedFacts, [Total_Activations] > 0 )
    )
RETURN
    _Results





You're correct. That is what I meant.

 

What are the dimensions you have in the blurred-out section? If it's equivalent to Product_ID and Customer_ID (over vice versa) then I'm not sure what's up. In particular, do any of the (expanded) rows in the matrix visual correspond to multiple products and/or multiple customers? If the matrix dimensions are at a higher level than Product_ID and Customer_ID, I'd expect some rows to measure > 1.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.