Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
charleshale
Continued Contributor
Continued Contributor

Count Frequency of measure over X on large table

I am trying to assess how many patch news sites get over 1000 posts a year (based on date post created).   I'm dealing with a 100m row table.

 

Can anyone see obvious reasons why the following fails to filter on _Threshold?   

Count # Sites > 1000 Posts = 
VAR _Threshold = 1000
VAR _Measure = 
CALCULATE( 
     COUNTAX( 
            KEEPFILTERS(VALUES(AllContent[patch_id])),     
            [#Posts|Distinct]>_Threshold), 
     USERELATIONSHIP(DimDate[Date], AllContent[date_create])) 

RETURN
_Measure    


I am trying to avoid a more memory "expensive" virtual table measure such as the following, which takes so much memory I can't even tell if it works (but should).

Count # Sites > 1000 Posts = 
VAR _Threshold = 1000
VAR _MyTable = 
SUMMARIZE(AllContent, AllContent[patch_id], "@Posts", if( [#Posts|Distinct]>_Threshold,1,0 ))
RETURN
COUNTX(_mytable, [@Posts])

 

2 ACCEPTED SOLUTIONS
v-yanjiang-msft
Community Support
Community Support

Hi @charleshale ,

According to your description, I create a sample.

vkalyjmsft_0-1642045097947.png

If I set the threshold is 1, the expected result will be 2(A and B).

Here's my measure formula.

Count # Sites > 1 Posts = 
VAR _Threshold = 1
VAR _COUNT =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[Patch_id] ),
        FILTER (
            'Table',
            CALCULATE (
                COUNT ( 'Table'[Merged] ),
                ALLEXCEPT ( 'Table', 'Table'[Patch_id] )
            ) > _Threshold
        )
    )
RETURN
    _COUNT

Get the expected result.

vkalyjmsft_1-1642045299427.png

I attach my sample below for reference.

 

Best Regards,
Community Support Team _ kalyj

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

View solution in original post

Hi there.   Thank you for the solution.   I have edited slightly and have the following two alternate formulations that work.  The following is the fastest on my giant table.

Count # Sites 2 = 

    VAR _Threshold = 98

    VAR _MyTable =

        FILTER ( VALUES ( 'Table'[Patch_id] ), 
        CALCULATE(SUM('Table'[Merged]),ALLEXCEPT('Table','Table'[Patch_id])) > _Threshold )  
       -- the above CALCULATE () IS A BASIC COLUMN COUNT THAT CAN BE REPLACED BY A [measure] >_Threshold
    VAR _Result =

        COUNTROWS ( _MyTable )

    RETURN

        _Result

  

The following is memory intensive but perhaps the simplest code

Count # Sites 3 = 
VAR _Threshold = 98
VAR _MyTable = 
SUMMARIZE('Table','Table'[Patch_id],
"@Posts", If (SUM([Merged]) > _Threshold ,1,0 ))
RETURN
sumx(_MyTable, [@Posts])

 

These yield the expcted result of 1 (ie only 1 patch ID -- G --  has the sum of Merged > 98.

 

Here is the file

REVISED COUNT 

 

 

View solution in original post

4 REPLIES 4
v-yanjiang-msft
Community Support
Community Support

Hi @charleshale ,

According to your description, I create a sample.

vkalyjmsft_0-1642045097947.png

If I set the threshold is 1, the expected result will be 2(A and B).

Here's my measure formula.

Count # Sites > 1 Posts = 
VAR _Threshold = 1
VAR _COUNT =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[Patch_id] ),
        FILTER (
            'Table',
            CALCULATE (
                COUNT ( 'Table'[Merged] ),
                ALLEXCEPT ( 'Table', 'Table'[Patch_id] )
            ) > _Threshold
        )
    )
RETURN
    _COUNT

Get the expected result.

vkalyjmsft_1-1642045299427.png

I attach my sample below for reference.

 

Best Regards,
Community Support Team _ kalyj

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Hi there.   Thank you for the solution.   I have edited slightly and have the following two alternate formulations that work.  The following is the fastest on my giant table.

Count # Sites 2 = 

    VAR _Threshold = 98

    VAR _MyTable =

        FILTER ( VALUES ( 'Table'[Patch_id] ), 
        CALCULATE(SUM('Table'[Merged]),ALLEXCEPT('Table','Table'[Patch_id])) > _Threshold )  
       -- the above CALCULATE () IS A BASIC COLUMN COUNT THAT CAN BE REPLACED BY A [measure] >_Threshold
    VAR _Result =

        COUNTROWS ( _MyTable )

    RETURN

        _Result

  

The following is memory intensive but perhaps the simplest code

Count # Sites 3 = 
VAR _Threshold = 98
VAR _MyTable = 
SUMMARIZE('Table','Table'[Patch_id],
"@Posts", If (SUM([Merged]) > _Threshold ,1,0 ))
RETURN
sumx(_MyTable, [@Posts])

 

These yield the expcted result of 1 (ie only 1 patch ID -- G --  has the sum of Merged > 98.

 

Here is the file

REVISED COUNT 

 

 

VahidDM
Super User
Super User

Hi @charleshale 

 

Check  [#Posts|Distinct]>_Threshold) in your formula.

When you use VALUES and the input parameter is a column name, returns a one-column table that contains the distinct values from the specified column. yoiu used AllContent[patch_id] in the values formula and then you used the another column (I think this is the issue).

 

 

Can you share a sample of your data and the expected result in a text format?


Appreciate your Kudos!!
LinkedIn: 
www.linkedin.com/in/vahid-dm/

 

 

charleshale
Continued Contributor
Continued Contributor

Thanks, @VahidDM .  It's certainly something like that.   

 

The data table looks like this, simplified:   the date of the post (AllContent[date_create]), a hashed title-date-author for the unique post (AllContent[Merged]), and the numerical id representing the particular patch.com local news site (AllContent[patch_id])

charleshale_1-1641779130437.png

In this instance, let's say I set the Threshold to >=2, the measure would return the value 1 because only one site (10997 -- which happens to be the New York City Patch) would have more than 2 posts in the timeframe specified.

 

The 2nd formulation I did works but it is too memory intensive.  Thanks for any thoughts!!    

 

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.