The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I am trying to assess how many patch news sites get over 1000 posts a year (based on date post created). I'm dealing with a 100m row table.
Can anyone see obvious reasons why the following fails to filter on _Threshold?
Count # Sites > 1000 Posts =
VAR _Threshold = 1000
VAR _Measure =
CALCULATE(
COUNTAX(
KEEPFILTERS(VALUES(AllContent[patch_id])),
[#Posts|Distinct]>_Threshold),
USERELATIONSHIP(DimDate[Date], AllContent[date_create]))
RETURN
_Measure
I am trying to avoid a more memory "expensive" virtual table measure such as the following, which takes so much memory I can't even tell if it works (but should).
Count # Sites > 1000 Posts =
VAR _Threshold = 1000
VAR _MyTable =
SUMMARIZE(AllContent, AllContent[patch_id], "@Posts", if( [#Posts|Distinct]>_Threshold,1,0 ))
RETURN
COUNTX(_mytable, [@Posts])
Solved! Go to Solution.
Hi @charleshale ,
According to your description, I create a sample.
If I set the threshold is 1, the expected result will be 2(A and B).
Here's my measure formula.
Count # Sites > 1 Posts =
VAR _Threshold = 1
VAR _COUNT =
CALCULATE (
DISTINCTCOUNT ( 'Table'[Patch_id] ),
FILTER (
'Table',
CALCULATE (
COUNT ( 'Table'[Merged] ),
ALLEXCEPT ( 'Table', 'Table'[Patch_id] )
) > _Threshold
)
)
RETURN
_COUNT
Get the expected result.
I attach my sample below for reference.
Best Regards,
Community Support Team _ kalyj
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi there. Thank you for the solution. I have edited slightly and have the following two alternate formulations that work. The following is the fastest on my giant table.
Count # Sites 2 =
VAR _Threshold = 98
VAR _MyTable =
FILTER ( VALUES ( 'Table'[Patch_id] ),
CALCULATE(SUM('Table'[Merged]),ALLEXCEPT('Table','Table'[Patch_id])) > _Threshold )
-- the above CALCULATE () IS A BASIC COLUMN COUNT THAT CAN BE REPLACED BY A [measure] >_Threshold
VAR _Result =
COUNTROWS ( _MyTable )
RETURN
_Result
The following is memory intensive but perhaps the simplest code
Count # Sites 3 =
VAR _Threshold = 98
VAR _MyTable =
SUMMARIZE('Table','Table'[Patch_id],
"@Posts", If (SUM([Merged]) > _Threshold ,1,0 ))
RETURN
sumx(_MyTable, [@Posts])
These yield the expcted result of 1 (ie only 1 patch ID -- G -- has the sum of Merged > 98.
Here is the file
Hi @charleshale ,
According to your description, I create a sample.
If I set the threshold is 1, the expected result will be 2(A and B).
Here's my measure formula.
Count # Sites > 1 Posts =
VAR _Threshold = 1
VAR _COUNT =
CALCULATE (
DISTINCTCOUNT ( 'Table'[Patch_id] ),
FILTER (
'Table',
CALCULATE (
COUNT ( 'Table'[Merged] ),
ALLEXCEPT ( 'Table', 'Table'[Patch_id] )
) > _Threshold
)
)
RETURN
_COUNT
Get the expected result.
I attach my sample below for reference.
Best Regards,
Community Support Team _ kalyj
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi there. Thank you for the solution. I have edited slightly and have the following two alternate formulations that work. The following is the fastest on my giant table.
Count # Sites 2 =
VAR _Threshold = 98
VAR _MyTable =
FILTER ( VALUES ( 'Table'[Patch_id] ),
CALCULATE(SUM('Table'[Merged]),ALLEXCEPT('Table','Table'[Patch_id])) > _Threshold )
-- the above CALCULATE () IS A BASIC COLUMN COUNT THAT CAN BE REPLACED BY A [measure] >_Threshold
VAR _Result =
COUNTROWS ( _MyTable )
RETURN
_Result
The following is memory intensive but perhaps the simplest code
Count # Sites 3 =
VAR _Threshold = 98
VAR _MyTable =
SUMMARIZE('Table','Table'[Patch_id],
"@Posts", If (SUM([Merged]) > _Threshold ,1,0 ))
RETURN
sumx(_MyTable, [@Posts])
These yield the expcted result of 1 (ie only 1 patch ID -- G -- has the sum of Merged > 98.
Here is the file
Hi @charleshale
Check [#Posts|Distinct]>_Threshold) in your formula.
When you use VALUES and the input parameter is a column name, returns a one-column table that contains the distinct values from the specified column. yoiu used AllContent[patch_id] in the values formula and then you used the another column (I think this is the issue).
Can you share a sample of your data and the expected result in a text format?
Appreciate your Kudos!!
LinkedIn: www.linkedin.com/in/vahid-dm/
Thanks, @VahidDM . It's certainly something like that.
The data table looks like this, simplified: the date of the post (AllContent[date_create]), a hashed title-date-author for the unique post (AllContent[Merged]), and the numerical id representing the particular patch.com local news site (AllContent[patch_id])
In this instance, let's say I set the Threshold to >=2, the measure would return the value 1 because only one site (10997 -- which happens to be the New York City Patch) would have more than 2 posts in the timeframe specified.
The 2nd formulation I did works but it is too memory intensive. Thanks for any thoughts!!
User | Count |
---|---|
79 | |
74 | |
42 | |
30 | |
28 |
User | Count |
---|---|
108 | |
96 | |
53 | |
48 | |
47 |