Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
EMark
Frequent Visitor

Determine and filter Outliers Dynamically - Feasible?

Hi guys,

 

I spent a fairly good time on this task (reading a lot of articles, forums, watching videos, countless trials and errors) without getting anywhere close to the right solution. Maybe there is a way to achieve what is required and I’m just stuck in a loop and doesn’t come to me. I will be happy if someone has done something similar and could advise on this, or if someone has an idea which I haven’t think about and haven’t tried.

 

These are the conditions:

 

  1. We have a cohort of customers (quite a few thousands) with customer IDs who spent certain amount of money. These customers can be grouped by different dimensions like: Registration date, Registration Week, Registration Month, Country, Mobile Platform (Android, iOS, Windows, etc..). For each customer ID (one per row) we have all the mentioned dimensions and one column with the revenue.
  2. What is required is to calculate and flag those customers who are outliers in terms of the revenue they generated. We need to be able by using a slicer/filter (outlier – Y, N) to include or exclude them from the report.
  3. As I method to determine who is an outlier we use Average + 3*STDEV.P over the revenue to establish the threshold and every customer who is above the thresh we consider as an outlier. Sounds easy and straightforward.

 

Now the fun part

 

  1. We cannot calculate this in advance neither in the SQL query, nor by adding to the table in the data model a calculated column. Actually, we can, but the result will be incorrect. Because we need to determine the outliers considering different conditions/combinations/filters:
  1. Outliers over the whole population for all times
  2. Outliers within a given date - the thresholds will be different and also the customers who registered within this given period will be different comparing them to all periods for example
  3. Outliers from a given country
  4. Or combinations – Outliers from France who use iOS and registered in May 2021 – the threshold will be different compared to the one calculated over all clients

 

All the above cannot be precalculated, flagged and used as a filter in the data model table or in the SQL query. There are more than 8 dimensions with loads of positions in each dimension (like tons of dates, many countries etc.) which takes us to numerous combinations. This needs to be calculated as a measure, to work dynamically and the result to be conditioned by the combination of dimensions we picked in the visual or to be determined by the filters selection. We use dynamic M query parameters, and from a slicer you can select which dimensions you want to visualise in a matrix for example.

 

  1. I created a measure which works, but it works only if the customer IDs are listed in a matrix visual, however if we want to show just the number of clients (their count) group by country and month the measure stops function, it doesn’t give an error, but simply is not working.
  2. Another thing is we cannot use a measure in a slicer as we know, however I created a switch function which was supposed to work as a filter, however it works only if in the visual you have listed the client IDs and the revenue column, if you add one dimension for example the Switch (filter) doesn’t affect what is visualised in the matrix. And again, if you do a customer count and remove the IDs from the matrix the filter is not working.

 

The bottom line is – if you only add the customer IDs and the revenue columns in a matrix visual then the measure works when you pick different filters. If you want to do summaries groupped by dimensions showing the count of the customers - the measure doesn’t function. IF you create two matrices – One summarizing with dimensions, counts etc, and another matrix with the customers listed with their IDs with the idea to use it to filter the summary one, you cannot - it doesn’t work, because you use a measure result to filter what is visualised in the second matrix and this doesn't affect the first matrix (the summary one).

 

I thought that it would be great if a function like this existed (but it doesn’t 😊) – the outcome of a measure results in a creation of a virtual temp table which result can be conditioned by the filters selection we do dynamically, and we link this table using the client ID to the main one using it as a filter.

 

Apologies for the long post. If there is something unclear please ask. I will be happy if there is a solution that can be applied in this case.

 

Thanks

4 REPLIES 4
EMark
Frequent Visitor

The thread can be close/deleted/locked, it is not actual anymore. The issue has been solved by a colleague of mine.

Hi @EMark ,

Could you share the results of your research and that of your coworkers? It may help other members on the forum, thank you!

 

 

Best Regards

Yilong Zhou

EMark
Frequent Visitor

If it will be helpful below you can see how the measures were built.

 

This is how the threshold is calculated:

Threshold =
CALCULATE (
AVERAGE ( Query1[Revenue] ) + (STDEV.P ( Query1[Revenue] ) * 3 ),
ALL(Query1[client_ID]),
Query1[Revenue_Flag])="Positive")

 

This is how it is determined if a customer is an outlier:

Outlier =

 IF(SELECTEDVALUE(Query1[Revenue_Flag])="Positive" && SUM(Query1[Revenue])>[Threshold ], "Y" ,"N")

Hi @EMark ,

As you mentioned, I think the measures can help you, you can use it to have a try. I have another idea, Power BI offers analytics features that might help in identifying outliers without the need for complex DAX formulas. For instance, the "Analyze" feature in Power BI Desktop allows you to find where distributions are different, which can be a starting point for identifying outliers.

You can read this document for a further study: Use insights to find where distribution is different - Power BI | Microsoft Learn

 

 

 

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.