Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
ryanjparks
Helper I
Helper I

Statistical Grouping and Averages in Dataset

Hey everyone,

 

I'm curious if there is a way to get statistical averages based on dividing datasets into groups? For instance, I have the following chart as an example:

 

ryanjparks_0-1728943938754.png

 

There are 18 customers buying a given widget, and I'd like to divide the dataset into groups of 3 based on Quantity purchased over a given Date filter, and then get the average price each for each group (and ranges of quantity, if I can manage it).

 

What would be the best way to accomplish this? I feel like StDev.P would come into play but my real data has very large disparity in quantities purchased so it's throwing off my calculation.

 

The goal is to be able to give my sales reps pricing guidelines given certain volume.

 

Thanks for any help!

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @ryanjparks 

 

Here's the sample data:

Table:

vzhengdxumsft_0-1729761083455.png

Then add a new measure:

Percent Rank =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    SUMX (
        FILTER ( _vtable, [_SALES] = SUM ( 'Table'[Sales           .] ) ),
        [Percent Rank]
    )

Next, add 3 measures:

Low Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0 && [Percent Rank] <= 0.33 ),
        [_AVG]
    )
Med Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.34 && [Percent Rank] <= 0.66 ),
        [_AVG]
    )
High Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.67 && [Percent Rank] <= 1 ),
        [_AVG]
    )

The result is as follow:

vzhengdxumsft_1-1729761244074.png

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

8 REPLIES 8
Anonymous
Not applicable

Hi @ryanjparks 

 

Here's the sample data:

Table:

vzhengdxumsft_0-1729761083455.png

Then add a new measure:

Percent Rank =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    SUMX (
        FILTER ( _vtable, [_SALES] = SUM ( 'Table'[Sales           .] ) ),
        [Percent Rank]
    )

Next, add 3 measures:

Low Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0 && [Percent Rank] <= 0.33 ),
        [_AVG]
    )
Med Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.34 && [Percent Rank] <= 0.66 ),
        [_AVG]
    )
High Volume =
VAR _vtable =
    VAR _vtable =
        SUMMARIZE (
            ALLSELECTED ( 'Table' ),
            'Table'[Customer Name        .],
            "_SALES", SUM ( 'Table'[Sales           .] ),
            "_AVG", AVERAGE ( 'Table'[$ / ea        .] )
        )
    RETURN
        ADDCOLUMNS (
            _vtable,
            "Percent Rank",
                VAR CurrentSales = [_SALES]
                VAR TotalRows =
                    COUNTROWS ( _vtable )
                VAR _Rank =
                    RANKX ( _vtable, [_SALES],, DESC, DENSE )
                VAR _minvalue =
                    MINX ( _vtable, [_SALES] )
                RETURN
                    IF ( [_SALES] = _minvalue, 0, DIVIDE ( TotalRows - _Rank, TotalRows - 1 ) )
        )
RETURN
    AVERAGEX (
        FILTER ( _vtable, [Percent Rank] >= 0.67 && [Percent Rank] <= 1 ),
        [_AVG]
    )

The result is as follow:

vzhengdxumsft_1-1729761244074.png

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Okay, so I'm nearly there. I applied the Percent Rank to my live data and am close, but I am getting strange results.

 

Your example seems to be aggregating based on Sales and not Qty Sold, so I attempted to change this:

 

 

_Percent Rank = 
VAR _vtable = 
     VAR _vtable = SUMMARIZE(ALLSELECTED('Sales'),'Sales'[Customer Name],"_QTY",SUM('Sales'[Qty Sold]))
         RETURN 
             ADDCOLUMNS(_vtable,"Percent Rank",
             -- VAR CurrentQty = [_QTY]
             VAR TotalRows = COUNTROWS(_vtable)
             VAR _Rank = RANKX(_vtable, [_QTY], , DESC, Dense)
             VAR _minvalue = MINX(_vtable,[_QTY])
             RETURN IF([_QTY]=_minvalue,0,DIVIDE(TotalRows - _Rank, TotalRows - 1)))
RETURN SUMX(FILTER(_vtable,[_QTY]=SUM('Sales'[Qty Sold])),[Percent Rank])

 

 

When filtered for a specific item, it is giving me results like this:

 

ryanjparks_0-1729873658357.png

 

I can't quite figure out why it is giving results above 1 (100%), but it seems to be aggregating some of the results together. When I get rid RANK instead of the PERCENT RANK (by deleting the Divide function and just giving the _Rank result), it is multiplying the rank by the number of times the duplicate Qty Sold is found in the table. For instance, what should be Rank #17 is doubled into Rank #34 because there are two customers with a Qty Sold of 410. There are others where there are 10 Customers with the same Qty Sold and the Rank result is multiplied by 10.

 

I also tested it in the PBI file you attached by adding another Customer 19 row and adding the same Qty Sold:

 

ryanjparks_0-1729887789982.png

 

How do I make it so duplicate results are evaluated on their own instead multiplying the rank result by the number times the same quantity is found for different customers?

 

ryanjparks
Helper I
Helper I

Okay, I could not figure it out after spending too much time on this. So here goes with details that are in a usable format:

 

Below is a table with sales for two items: ItemID DR1001 & ZA5009 (made up items). Every row represents a sale made to a customer for a specific item. There are many rows to a single invoice in this table

 

Customer Name        .ItemID        .Sales           .Qty Sold          .$ / ea        .
Customer 1DR1001$54,9901833$30
Customer 10DR1001$35,0001000$35
Customer 11DR1001$33,985971$35
Customer 15DR1001$26,000650$40
Customer 12DR1001$33,005943$35
Customer 10DR1001$34,055973$35
Customer 15DR1001$19,000475$40
Customer 16DR1001$14,985333$45
Customer 4DR1001$43,9891419$31
Customer 7DR1001$39,9961212$33
Customer 16DR1001$9,000200$45
Customer 17DR1001$9,990222$45
Customer 18ZA5009$10,980244$45
Customer 6DR1001$37,0141194$31
Customer 13DR1001$21,000525$40
Customer 14DR1001$22,000550$40
Customer 2DR1001$54,0001800$30
Customer 9ZA5009$33,9901030$33
Customer 2DR1001$51,0001700$30
Customer 3DR1001$51,9901733$30
Customer 8DR1001$39,0061182$33
Customer 11DR1001$33,985971$35
Customer 17DR1001$13,995311$45
Customer 3DR1001$50,0101667$30
Customer 18ZA5009$9,990222$45
Customer 4DR1001$41,9741354$31
Customer 5ZA5009$41,0131323$31
Customer 13DR1001$28,000700$40
Customer 14DR1001$20,000500$40
Customer 1DR1001$51,9901733$30
Customer 5ZA5009$38,0061226$31
Customer 12DR1001$31,010886$35
Customer 6DR1001$42,0051355$31
Customer 7DR1001$34,9801060$33
Customer 8DR1001$36,0031091$33
Customer 9ZA5009$38,0161152$33

 

I need to be able to use a Slicer to filter for the desired ItemID.....Easy enough, so let's say I create a slicer and select ItemID "DR1001". I need the data to be displayed like this in the table output on the page:

 

Customer Name      .Sum of Sales     .Sum of Qty Sold     .Avg of $ / ea     .Percent Rank     .
Customer 1$106,980           3,566$30.00100.00%
Customer 2$105,000           3,500$30.0092.80%
Customer 3$102,000           3,400$30.0085.70%
Customer 4$85,963           2,773$31.0078.50%
Customer 6$79,019           2,549$31.0071.40%
Customer 7$74,976           2,272$33.0057.10%
Customer 8$75,009           2,273$33.0064.20%
Customer 10$69,055           1,973$35.0050.00%
Customer 11$67,970           1,942$35.0042.80%
Customer 12$64,015           1,829$35.0035.70%
Customer 13$49,000           1,225$40.0028.50%
Customer 14$42,000           1,050$40.0014.20%
Customer 15$45,000           1,125$40.0021.40%
Customer 16$23,985               533$45.000.00%
Customer 17$23,985               533$45.000.00%

 

Percent Rank is what I'm looking for, based on totals grouped by Customer Name. It needs to be calculated just like Excel does, where I can then create 3 separate Measures that will Group the resulting table data into 3 categories: Low volume (0-33%), Medium volume (34-66%) and High Volume (67-100%). I will then get a Weighted Average sale Price Each for each grouping, which would calculate the Sum of Total Sales / Sum of Qty Sold. I'd then place those measures in Cards which would display the Low, Medium and High volume average sales price each for the sales reps to see, as a guideline to volume based pricing. It would look like this:

 

ryanjparks_2-1729542130857.png

So far, no matter what I do I cannot get a percent rank per Customer Name based on the Quantity Sold against all the other customers. I believe the reason for this is because I am using summed data and not single, individual rows (but I am not sure about this). When I have done a COUNTROWS on the data, it is counting the number of rows found in the source data table, not the number of rows resulting in the output table. And further, I could not get a proper comparison in the total Qty Sold for a given customer against a total Qty Sold for a different customer in the output table.

 

Thanks for any help.

ryanjparks
Helper I
Helper I

SNIP

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).

Do not include sensitive information. Do not include anything that is unrelated to the issue or question.

Need help uploading data? https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...

Please show the expected outcome based on the sample data you provided.

Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...

lbendlin
Super User
Super User

Is the number of buckets fixed? What are the buckets based on?  Exclusively on Sales Value ranges?  Why not on item value ranges? Or profitability?  You mention pricing guidelines based on volume - is that Quantity sold?

 

Sounds like PercentileX may be another option.

The number of buckets are fixed at 3: Low, Middle and High (Qty Sold volume)....however, the size of the buckets (or rather number of customers in each bucket) will vary based on the number of customers who have bought the product in the last year.

 

Item Value (Each) is what I am trying to get the weighted average for each volume bucket. The reason I'm not going for Item Value ranges is because there are customers in all 3 buckets that are at the extreme ranges of the entire price scale, so it adds no value. But if I take the weighted average of each bucket, that gives my sales reps a better idea of where price is relative to annual volume ranges.

 

I will look into PercentileX....that sounds promising, thanks!

Anonymous
Not applicable

Hi @ryanjparks 

 

Have you solved your problem? If so, can you share your solution here and mark the correct answer as a standard answer to help other members find it faster? Thank you very much for your kind cooperation!

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.