Re: QUARTILE.EXC

Greg_Deckler · ‎05-03-2020

In my recent quest to create or catalog as many DAX equivalents for Excel functions, this just adds to the struggles with Excel's QUARTILE function. So, in numerous locations, such as this; not that...God forbid, Quora is authoritative about anything, but in mulitiple places there are two documented methods of calculating 1st and 3rd quartiles and they all seem to essentially agree about the general nature of those two methods. So I implemented the exclusive method here but the results seem to come out to more along the lines of the inclusive method, not that I trust Excel's calculations of those either. So, who knows at this point. Here it is though.

QUARTILE.EXC = 
    VAR __Values = SELECTCOLUMNS('Table',"Values",[Column1])
    VAR __Quart = MAX('Quartiles'[Quart])
    VAR __Median = MEDIANX(__Values,[Values])
    VAR __Count = COUNTROWS(__Values)
    VAR __Quartile =
        SWITCH(__Quart,
            0,MINX(__Values,[Values]),
            2,__Median,
            4,MAXX(__Values,[Values]),
            1,
                VAR __Median = IF(
                                    ISEVEN(__Count),
                                    MEDIANX(FILTER(__Values,[Values] < __Median),[Values]),
                                    MEDIANX(FILTER(__Values,[Values] <= __Median),[Values])
                                )
                RETURN __Median,
            3,
                VAR __Median = IF(
                                    ISEVEN(__Count),
                                    MEDIANX(FILTER(__Values,[Values] > __Median),[Values]),
                                    MEDIANX(FILTER(__Values,[Values] >= __Median),[Values])
                                )
                RETURN __Median
        )
RETURN
    __Quartile

All I can say is that apparently either everyone else in the entire world (as far as I can find) is wrong about how to calculate quartiles or...maybe I am missing something.

Something else that bugs me, all of the documentation on QUARTILE.INC, QUARTILE.EXC, PERCENTILE.INC, PERCENTILE.EXC all focus on the "inclusive/exclusive" part about the kth values from 0..1. Except that seems like the least important part to me because there are clearly different methods going on here in terms of how these functions compute the quartiles/percentiles because you can get very different answers, especially when dealing with even numbers of items. The fact that you can't use 0 and 1 in one of them seems like the last thing that you would want to explain but rather explain why the calculated values are different?

And another thing with regard to the "interpolation", apparently that is why the numbers generated for the 1st and 3rd quartiles in Excel varies from the way everybody else does it so how exactly is this interpolation happening and why is it better or worse than the way everyone else seems to do it?

eyJrIjoiNzQxZTc1ZDgtMmE2Ni00NDE0LWExNjktZWJiMzBhZTk3Y2UyIiwidCI6IjRhMDQyNzQzLTM3M2EtNDNkMi04MjdiLTAwM2Y0YzdiYTFlNSIsImMiOjN9

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎05-03-2020

Excel interpolates whereas your code doesn't.

"If the requested quartile falls between two of the values in the supplied array, Excel interpolates between these values to calculate the quartile value."

https://www.excelfunctions.net/excel-quartile-exc-function.html

Best
D

Greg_Deckler · ‎05-03-2020

I understand that @Anonymous, I just couldn't figure out how but now I think I understand how it is doing the interpolation but I can't decide if that is the correct way it should be doing it because it doesn't seem to match up with the way anybody else does it that I can find. And it gets really bizarre when you consider QUARTILE and QUARTILE.INC for this problem.

So if I understand what it is doing (Excel), it uses the standard quartile 1 formula, 1/4 * (n + 1) to find the "rank" for the first quartile. This produces 2.25 meaning that it should interpolate between 2nd and 3rd numbers which are 2 and 4. If you shortcut the math, .25 of 2 is .5 so 2 + .5 is 2.5. Same with quartile 3, 3/4 * (n + 1) would be 6.75 so interpolate between the 6th and 7th numbers, which are 9 and 10, shortcut .75 * 1 = .75 so 9 + .75 = 9.75. All's fair here until you realize that QUARTILE and QUARTILE.INC in Excel return 3.5 for quartile 1 and 9.25 for quartile 3. This is really where I am struggling because I don't see how you get to those numbers unless for some reason you are interpolating from the higher number for .INC and interpolating like normal people interpolate for the .EXC versions. In other words, the interpolation is .5 for quartile 1 so you say 4 - .5 = 3.5 and 10 - .75 = 9.25. But the question has to be why? Why are you interpolating in two completely different ways? That's the real difference between .INC and .EXC if that is the case. And I still don't think that either method is what is done in best practices at least as far as I can tell from researching this on the Internet.

So, I understand interpolation, I had to do enough of it in thermodynamics and fluid dynamics to last me a lifetime. What I don't understand is how Excel is doing its interpolation because it doesn't make any sense and it seems to do it one way sometimes and another way other times.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎05-03-2020

https://peltiertech.com/quartiles/

Greg_Deckler · ‎05-03-2020

OK, one versus:

I could go on, it's like the first three pages of search results. But, the important question is how in the world is the difference in interpolation techniques not documented in the explanation of these functions PERCENTILE.INC, PERCENTILE.EXC, QUARTILE.INC and QUARTILE.EXC but instead focuses on what values of "k" you can pass into them? I mean, it makes no sense. How does INC translate into N-1 and EXC translate into N+1 and why not just use N as John Peltier himself states?

".INC is the same as Excel’s “inclusive” (N-1) legacy functions, and .EXC is the same as everyone else’s “exclusive” (N+1) functions. I don’t really know why anyone would pick either (N-1) or (N+1) over the other, or why they’d pick either of these over the N-basis calculations."

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎05-03-2020

It's a question for M$. After all, you can contact them and ask. But does it really matter? Since there are slightly different definitions and they are all asymptotically the same, I wouldn't make a fuss about it. That there is no documentation is a different matter, though.

Best
D