cancel
Showing results for
Did you mean:

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Helper II

## Filter outlier data rows from a data table in power bi

Hello guys,

I am a new user of power bi desktop and I have manged to do some tasks in power bi so far.

At the moment I am going to filter some recordes of data from transaction table before i use the data for other purposes. I have a transaction table which has many outlier data (e.g out of range order value ) .

I have created a formula to  filter them based on Standard deviation of the exisiting order value data. First, I would to sort the order values from max to min and then calculate STD  of those data.then i filter the max row one by one until the STD<200.I stucked to do this job in Power bi....Any soloution can help me to do this task?I want this procedure be done whenever the transaction table is updated with new data.

Mehdi

2 ACCEPTED SOLUTIONS
MVP

@myti What is the source of your data? And this sounds like something you would want to do on the import side of things, so i'd recommend you try to get some feedback from @ImkeF, she's the master of "M" related solutions.

Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG
Super User

@myti: You shouldn't mark your question as answered: If @Eno1978 Eno hadn't posted me in I wouldn't have looked at it 🙂

Fortunately, there is a function for Standard Deviation in M already. But you need a trick in order to apply it to the needed list / to create the lists with different length. Therefore you add an Index-column after you sorted your values. Then you create the lists by specifying all values from the column up to the current row. So your code would look like this:

```let
Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content],
#"Sorted Rows" = Table.Sort(Source,{{"List", Order.Ascending}}),
in

Where "Tabelle2" is the source-table with column "List" with the values to sort.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

12 REPLIES 12
MVP

@myti What is the source of your data? And this sounds like something you would want to do on the import side of things, so i'd recommend you try to get some feedback from @ImkeF, she's the master of "M" related solutions.

Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG
Helper II

My data is online sales data that I import from Azure database to power bi.Should I post my question in those communities?

thanks

MVP

@myti hmm. Is there a reason you don't just create a view on the SQL side? This is the right community, and the "M" would be applicable to filter the data prior to it being loaded if you were using import. But if you are using direct query, then I would just do everything in SQL.

My initial assumption is that you were using some sort of flat file, so I recommended ImkeF. But if you are using SQL, I would suggest you just filter it down in a view, or write a query and execute the query in Power BI.

Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG
Helper II

Thanks  a lot @Seth_C_Bauer.

I will check your suggestion as well.Actully It is a good idea.But the project manager intrested to do all calculation  in PBI instead of Azure.

I would to join your MSBIWI,howere I am not staying in USA.... 🙂

Super User

@myti: You shouldn't mark your question as answered: If @Eno1978 Eno hadn't posted me in I wouldn't have looked at it 🙂

Fortunately, there is a function for Standard Deviation in M already. But you need a trick in order to apply it to the needed list / to create the lists with different length. Therefore you add an Index-column after you sorted your values. Then you create the lists by specifying all values from the column up to the current row. So your code would look like this:

```let
Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content],
#"Sorted Rows" = Table.Sort(Source,{{"List", Order.Ascending}}),
in

Where "Tabelle2" is the source-table with column "List" with the values to sort.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

Helper II

Thank you very much @ImkeF for your solution. Infact @Seth_C_Bauer did a great favor.

I just tried to applied your proposed M code.It seems correct and work well until indexing.However,I dont know why the new column  "custom" is filled with Error.

For example,I have a  a column of "ordervalues"  for 200 hundered records and for each value i would like to calculate the standard deviation of each value and other values less than the current value.I will be happy if you ca help me to sovle this issue.

Thank you,

Best Regards

Super User
```let
Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content​],
#"Sorted Rows" = Table.Sort(Source,{{"ordervalues", Order.Ascending}}),
in

So you need to replace the column names in the code with yours. Tried it according to your description.

If this doesn't work, please post picture of your table or sample data. Please keep in mind that this is all case sensitive!!

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

Helper II

Thank you ver much for your fast response @ImkeF

Actully, I considered all the points that you have mentioned.But I still receive the same results.So i am sendig you the picture of my table which is only one column of ordervalues. and also the code i excute on those data.Thank you very much for your time and efforts:

M code:

let
Source = Excel.Workbook(File.Contents("D:\OneDrive\PowerBI documents\PowerQuery\Sample.xlsx"), null, true),
Sheet4_Sheet = Source{[Item="Sheet4",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Order Value", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Order Value", "OrderValue"}}),
#"Sorted Rows" = Table.Sort(#"Renamed Columns",{{"OrderValue", Order.Descending}}),

in

and efforts.

Super User

What does the error-message say? (click on the blank space right beneath "error", in the same cell)

```let
Source = Excel.Workbook(File.Contents("D:\OneDrive\PowerBI documents\PowerQuery\Sample.xlsx"), null, true),
Sheet4_Sheet = Source{[Item="Sheet4",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Order Value", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Order Value", "OrderValue"}}),
#"Sorted Rows" = Table.Buffer(Table.Sort(#"Renamed Columns",{{"OrderValue", Order.Ascending}})),
in

I has Buffers and the sort-order is different. I just used this code & it worked for me.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

Helper II

Thanks alot @ImkeF .

I have tried your code.I still receive the error. The error is "Expression.Error: The number is out of range of a 32 bit integer value.
Details:
0.03"

I think the problem is my values are decimals.The code works well without error if the values are whole number(integer). and I think the clalculated StandardDeviation values  are also not correct as I compared them with the values that normal excel produce.

I will be happy if you can help me to solve the issue.

Thanks

Myti

Super User

If it's "just" about catching some outliers, how about creating a custom column where you multiply your original values by 100 (or whatever factor needed in order to end up with significant values) and then round them. Although the standard deviations (divided by 100) will be a bit biased, but should be good enough to identify the most relevant outliers.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

Helper II

yes...at the momement it is only cataching some outliers.

Actually our data got wide range of values that can be between 100,000 to 0.8.... Infact the values over  3000 or 4000 are outliers.Just I wanted to have systematic way for filtering them.

Thank you very much,

Best Regards

Myti