Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
myti
Helper II
Helper II

Filter outlier data rows from a data table in power bi

Hello guys,

 

I am a new user of power bi desktop and I have manged to do some tasks in power bi so far.

At the moment I am going to filter some recordes of data from transaction table before i use the data for other purposes. I have a transaction table which has many outlier data (e.g out of range order value ) .

I have created a formula to  filter them based on Standard deviation of the exisiting order value data. First, I would to sort the order values from max to min and then calculate STD  of those data.then i filter the max row one by one until the STD<200.I stucked to do this job in Power bi....Any soloution can help me to do this task?I want this procedure be done whenever the transaction table is updated with new data.

 

 

Thank you in advance

Mehdi

 

 

 

 

2 ACCEPTED SOLUTIONS

@myti What is the source of your data? And this sounds like something you would want to do on the import side of things, so i'd recommend you try to get some feedback from @ImkeF, she's the master of "M" related solutions.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

View solution in original post

@myti: You shouldn't mark your question as answered: If @Eno1978 Eno hadn't posted me in I wouldn't have looked at it 🙂

 

Fortunately, there is a function for Standard Deviation in M already. But you need a trick in order to apply it to the needed list / to create the lists with different length. Therefore you add an Index-column after you sorted your values. Then you create the lists by specifying all values from the column up to the current row. So your code would look like this:

 

let
    Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"List", Order.Ascending}}),
    #"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.StandardDeviation({#"Added Index"[List]{0}..#"Added Index"[List]{[Index]}}))
in
    #"Added Custom"

Where "Tabelle2" is the source-table with column "List" with the values to sort. 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

12 REPLIES 12

@myti What is the source of your data? And this sounds like something you would want to do on the import side of things, so i'd recommend you try to get some feedback from @ImkeF, she's the master of "M" related solutions.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

My data is online sales data that I import from Azure database to power bi.Should I post my question in those communities?

 

thanks

@myti hmm. Is there a reason you don't just create a view on the SQL side? This is the right community, and the "M" would be applicable to filter the data prior to it being loaded if you were using import. But if you are using direct query, then I would just do everything in SQL. 

My initial assumption is that you were using some sort of flat file, so I recommended ImkeF. But if you are using SQL, I would suggest you just filter it down in a view, or write a query and execute the query in Power BI.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Thanks  a lot @Seth_C_Bauer.

I will check your suggestion as well.Actully It is a good idea.But the project manager intrested to do all calculation  in PBI instead of Azure.

I would to join your MSBIWI,howere I am not staying in USA.... 🙂

@myti: You shouldn't mark your question as answered: If @Eno1978 Eno hadn't posted me in I wouldn't have looked at it 🙂

 

Fortunately, there is a function for Standard Deviation in M already. But you need a trick in order to apply it to the needed list / to create the lists with different length. Therefore you add an Index-column after you sorted your values. Then you create the lists by specifying all values from the column up to the current row. So your code would look like this:

 

let
    Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"List", Order.Ascending}}),
    #"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.StandardDeviation({#"Added Index"[List]{0}..#"Added Index"[List]{[Index]}}))
in
    #"Added Custom"

Where "Tabelle2" is the source-table with column "List" with the values to sort. 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thank you very much @ImkeF for your solution. Infact @Seth_C_Bauer did a great favor.

I just tried to applied your proposed M code.It seems correct and work well until indexing.However,I dont know why the new column  "custom" is filled with Error.

For example,I have a  a column of "ordervalues"  for 200 hundered records and for each value i would like to calculate the standard deviation of each value and other values less than the current value.I will be happy if you ca help me to sovle this issue.

 

 

Thank you,

Best Regards

 

 

 

let
    Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content​],
    #"Sorted Rows" = Table.Sort(Source,{{"ordervalues", Order.Ascending}}),
    #"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.StandardDeviation({#"Added Index"[ordervalues]{0}..#"Added Index"[ordervalues]{[Index]}}))
in
    #"Added Custom"

So you need to replace the column names in the code with yours. Tried it according to your description.

If this doesn't work, please post picture of your table or sample data. Please keep in mind that this is all case sensitive!! 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thank you ver much for your fast response @ImkeF

 

Actully, I considered all the points that you have mentioned.But I still receive the same results.So i am sendig you the picture of my table which is only one column of ordervalues. and also the code i excute on those data.Thank you very much for your time and efforts:

 

M code:

 

let
Source = Excel.Workbook(File.Contents("D:\OneDrive\PowerBI documents\PowerQuery\Sample.xlsx"), null, true),
Sheet4_Sheet = Source{[Item="Sheet4",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet4_Sheet),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Order Value", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Order Value", "OrderValue"}}),
#"Sorted Rows" = Table.Sort(#"Renamed Columns",{{"OrderValue", Order.Descending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.StandardDeviation({#"Added Index"[OrderValue]{0}..#"Added Index"[OrderValue]{[Index]}}))

in
#"Added Custom"

 

 

 

 

and efforts.2016-05-09 (2).png

What does the error-message say? (click on the blank space right beneath "error", in the same cell)

 

Try this code instead:

let
Source = Excel.Workbook(File.Contents("D:\OneDrive\PowerBI documents\PowerQuery\Sample.xlsx"), null, true),
Sheet4_Sheet = Source{[Item="Sheet4",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet4_Sheet),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Order Value", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Order Value", "OrderValue"}}),
#"Sorted Rows" = Table.Buffer(Table.Sort(#"Renamed Columns",{{"OrderValue", Order.Ascending}})),
#"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.StandardDeviation({#"Added Index"[OrderValue]{0}..#"Added Index"[OrderValue]{[Index]}}))
in
#"Added Custom"

I has Buffers and the sort-order is different. I just used this code & it worked for me.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thanks alot @ImkeF .

 

I have tried your code.I still receive the error. The error is "Expression.Error: The number is out of range of a 32 bit integer value.
Details:
0.03"

I think the problem is my values are decimals.The code works well without error if the values are whole number(integer). and I think the clalculated StandardDeviation values  are also not correct as I compared them with the values that normal excel produce.

I will be happy if you can help me to solve the issue.

 

Thanks

Myti

If it's "just" about catching some outliers, how about creating a custom column where you multiply your original values by 100 (or whatever factor needed in order to end up with significant values) and then round them. Although the standard deviations (divided by 100) will be a bit biased, but should be good enough to identify the most relevant outliers.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

yes...at the momement it is only cataching some outliers.

Actually our data got wide range of values that can be between 100,000 to 0.8.... Infact the values over  3000 or 4000 are outliers.Just I wanted to have systematic way for filtering them. 

 

Thank you very much,

Best Regards

Myti

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.