Solved: Statistical Grouping and Averages in Dataset

ryanjparks · ‎10-14-2024

Hey everyone,

I'm curious if there is a way to get statistical averages based on dividing datasets into groups? For instance, I have the following chart as an example:

There are 18 customers buying a given widget, and I'd like to divide the dataset into groups of 3 based on Quantity purchased over a given Date filter, and then get the average price each for each group (and ranges of quantity, if I can manage it).

What would be the best way to accomplish this? I feel like StDev.P would come into play but my real data has very large disparity in quantities purchased so it's throwing off my calculation.

The goal is to be able to give my sales reps pricing guidelines given certain volume.

Thanks for any help!

Anonymous · ‎10-24-2024

Hi @ryanjparks

Here's the sample data:

Table:

Then add a new measure:

Percent Rank =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    SUMX (
        FILTER ( _vtable, [_SALES] = SUM ( 'Table'[Sales           .] ) ),
        [Percent Rank]
    )

Next, add 3 measures:

Low Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0 && [Percent Rank] <= 0.33 ),
        [_AVG]
    )

Med Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.34 && [Percent Rank] <= 0.66 ),
        [_AVG]
    )

High Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.67 && [Percent Rank] <= 1 ),
        [_AVG]
    )

The result is as follow:

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Anonymous · ‎10-24-2024

Hi @ryanjparks

Here's the sample data:

Table:

Then add a new measure:

Percent Rank =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    SUMX (
        FILTER ( _vtable, [_SALES] = SUM ( 'Table'[Sales           .] ) ),
        [Percent Rank]
    )

Next, add 3 measures:

Low Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0 && [Percent Rank] <= 0.33 ),
        [_AVG]
    )

Med Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.34 && [Percent Rank] <= 0.66 ),
        [_AVG]
    )

High Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.67 && [Percent Rank] <= 1 ),
        [_AVG]
    )

The result is as follow:

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

ryanjparks · ‎10-25-2024

Okay, so I'm nearly there. I applied the Percent Rank to my live data and am close, but I am getting strange results.

Your example seems to be aggregating based on Sales and not Qty Sold, so I attempted to change this:

_Percent Rank = 
VAR _vtable = 
     VAR _vtable = SUMMARIZE(ALLSELECTED('Sales'),'Sales'[Customer Name],"_QTY",SUM('Sales'[Qty Sold]))
         RETURN 
             ADDCOLUMNS(_vtable,"Percent Rank",
             -- VAR CurrentQty = [_QTY]
             VAR TotalRows = COUNTROWS(_vtable)
             VAR _Rank = RANKX(_vtable, [_QTY], , DESC, Dense)
             VAR _minvalue = MINX(_vtable,[_QTY])
             RETURN IF([_QTY]=_minvalue,0,DIVIDE(TotalRows - _Rank, TotalRows - 1)))
RETURN SUMX(FILTER(_vtable,[_QTY]=SUM('Sales'[Qty Sold])),[Percent Rank])

When filtered for a specific item, it is giving me results like this:

I can't quite figure out why it is giving results above 1 (100%), but it seems to be aggregating some of the results together. When I get rid RANK instead of the PERCENT RANK (by deleting the Divide function and just giving the _Rank result), it is multiplying the rank by the number of times the duplicate Qty Sold is found in the table. For instance, what should be Rank #17 is doubled into Rank #34 because there are two customers with a Qty Sold of 410. There are others where there are 10 Customers with the same Qty Sold and the Rank result is multiplied by 10.

I also tested it in the PBI file you attached by adding another Customer 19 row and adding the same Qty Sold:

How do I make it so duplicate results are evaluated on their own instead multiplying the rank result by the number times the same quantity is found for different customers?

ryanjparks · ‎10-21-2024

Okay, I could not figure it out after spending too much time on this. So here goes with details that are in a usable format:

Below is a table with sales for two items: ItemID DR1001 & ZA5009 (made up items). Every row represents a sale made to a customer for a specific item. There are many rows to a single invoice in this table

Customer Name .	ItemID .	Sales .	Qty Sold .	$ / ea .
Customer 1	DR1001	$54,990	1833	$30
Customer 10	DR1001	$35,000	1000	$35
Customer 11	DR1001	$33,985	971	$35
Customer 15	DR1001	$26,000	650	$40
Customer 12	DR1001	$33,005	943	$35
Customer 10	DR1001	$34,055	973	$35
Customer 15	DR1001	$19,000	475	$40
Customer 16	DR1001	$14,985	333	$45
Customer 4	DR1001	$43,989	1419	$31
Customer 7	DR1001	$39,996	1212	$33
Customer 16	DR1001	$9,000	200	$45
Customer 17	DR1001	$9,990	222	$45
Customer 18	ZA5009	$10,980	244	$45
Customer 6	DR1001	$37,014	1194	$31
Customer 13	DR1001	$21,000	525	$40
Customer 14	DR1001	$22,000	550	$40
Customer 2	DR1001	$54,000	1800	$30
Customer 9	ZA5009	$33,990	1030	$33
Customer 2	DR1001	$51,000	1700	$30
Customer 3	DR1001	$51,990	1733	$30
Customer 8	DR1001	$39,006	1182	$33
Customer 11	DR1001	$33,985	971	$35
Customer 17	DR1001	$13,995	311	$45
Customer 3	DR1001	$50,010	1667	$30
Customer 18	ZA5009	$9,990	222	$45
Customer 4	DR1001	$41,974	1354	$31
Customer 5	ZA5009	$41,013	1323	$31
Customer 13	DR1001	$28,000	700	$40
Customer 14	DR1001	$20,000	500	$40
Customer 1	DR1001	$51,990	1733	$30
Customer 5	ZA5009	$38,006	1226	$31
Customer 12	DR1001	$31,010	886	$35
Customer 6	DR1001	$42,005	1355	$31
Customer 7	DR1001	$34,980	1060	$33
Customer 8	DR1001	$36,003	1091	$33
Customer 9	ZA5009	$38,016	1152	$33

I need to be able to use a Slicer to filter for the desired ItemID.....Easy enough, so let's say I create a slicer and select ItemID "DR1001". I need the data to be displayed like this in the table output on the page:

Customer Name .	Sum of Sales .	Sum of Qty Sold .	Avg of $ / ea .	Percent Rank .
Customer 1	$106,980	3,566	$30.00	100.00%
Customer 2	$105,000	3,500	$30.00	92.80%
Customer 3	$102,000	3,400	$30.00	85.70%
Customer 4	$85,963	2,773	$31.00	78.50%
Customer 6	$79,019	2,549	$31.00	71.40%
Customer 7	$74,976	2,272	$33.00	57.10%
Customer 8	$75,009	2,273	$33.00	64.20%
Customer 10	$69,055	1,973	$35.00	50.00%
Customer 11	$67,970	1,942	$35.00	42.80%
Customer 12	$64,015	1,829	$35.00	35.70%
Customer 13	$49,000	1,225	$40.00	28.50%
Customer 14	$42,000	1,050	$40.00	14.20%
Customer 15	$45,000	1,125	$40.00	21.40%
Customer 16	$23,985	533	$45.00	0.00%
Customer 17	$23,985	533	$45.00	0.00%

Percent Rank is what I'm looking for, based on totals grouped by Customer Name. It needs to be calculated just like Excel does, where I can then create 3 separate Measures that will Group the resulting table data into 3 categories: Low volume (0-33%), Medium volume (34-66%) and High Volume (67-100%). I will then get a Weighted Average sale Price Each for each grouping, which would calculate the Sum of Total Sales / Sum of Qty Sold. I'd then place those measures in Cards which would display the Low, Medium and High volume average sales price each for the sales reps to see, as a guideline to volume based pricing. It would look like this:

So far, no matter what I do I cannot get a percent rank per Customer Name based on the Quantity Sold against all the other customers. I believe the reason for this is because I am using summed data and not single, individual rows (but I am not sure about this). When I have done a COUNTROWS on the data, it is counting the number of rows found in the source data table, not the number of rows resulting in the output table. And further, I could not get a proper comparison in the total Qty Sold for a given customer against a total Qty Sold for a different customer in the output table.

Thanks for any help.

ryanjparks · ‎10-21-2024

SNIP

lbendlin · ‎10-21-2024

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).

Do not include sensitive information. Do not include anything that is unrelated to the issue or question.

Need help uploading data? https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...

Please show the expected outcome based on the sample data you provided.

Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...

lbendlin · ‎10-14-2024

Is the number of buckets fixed? What are the buckets based on? Exclusively on Sales Value ranges? Why not on item value ranges? Or profitability? You mention pricing guidelines based on volume - is that Quantity sold?

Sounds like PercentileX may be another option.

ryanjparks · ‎10-15-2024

The number of buckets are fixed at 3: Low, Middle and High (Qty Sold volume)....however, the size of the buckets (or rather number of customers in each bucket) will vary based on the number of customers who have bought the product in the last year.

Item Value (Each) is what I am trying to get the weighted average for each volume bucket. The reason I'm not going for Item Value ranges is because there are customers in all 3 buckets that are at the extreme ranges of the entire price scale, so it adds no value. But if I take the weighted average of each bucket, that gives my sales reps a better idea of where price is relative to annual volume ranges.

I will look into PercentileX....that sounds promising, thanks!

Anonymous · ‎10-20-2024

Hi @ryanjparks

Have you solved your problem? If so, can you share your solution here and mark the correct answer as a standard answer to help other members find it faster? Thank you very much for your kind cooperation!

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Statistical Grouping and Averages in Dataset

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Statistical Grouping and Averages in Dataset

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025