Solved: Re: Convert weighted list to vector or list

Anonymous · ‎07-23-2020

I have a dataset of thousands of products with millions of transactions, where most of these transactions involve QTY <> 1. One measurement I'm attempting to calculate is comparing the skewness of a product's price over time, but this involves transforming a dataset like:

Product	Price	Qty
A	$0.90	2
A	$1.00	1
A	$1.10	1
B	$1.00	1

List for A:

$0.90

$1.00

$1.10

This way, I can derive:

Mean = $0.975
Median = $0.95
Mode = $0.90
Std Dev = 0.082916
Pearson's 1st Skewness = 0.90453
Pearson's 2nd Skewness = 0.90453

Any recommendations on how DAX can come up with a list that repeats values based on a second column? I'm hesitant to even build out a second fact table that does this transformation because it will just become rediculously long. Creating a summarized table using SQL or M won't work because the list needs to be able to assume filters throughout the rest of the report (e.g. product, customer, transaction date, &c.) Calculating Mean and Mode are easy enough to perform with the dataset, but I need to transform the data to calculate a proper median and standard deviation.

mahoneypat · ‎07-25-2020

That is a lot of rows. I thought more about this and came up with a way in DAX to make your table virtually (it could also be adapted to make it physically in a DAX calculated table).

I know you'll probably stick with R or Python, but this was a fun challenge, so I worked on it further. Please see the measures below. Mean was straight forward, Mode a little trickier, but Median and StdDev are the ones that use the virtual table. Pearson's is then calculated from mean, median, and stddev (hopefully that approach works for you).

Mean =
SUMX ( Data, Data[Price] * Data[Qty] )
    / SUM ( Data[Qty] )

Mode =
TOPN ( 1, VALUES ( Data[Price] ), CALCULATE ( SUM ( Data[Qty] ) ) )

Median = 
VAR numberofvalues =
    SUM ( Data[Qty] )
VAR sequencetable =
    ADDCOLUMNS (
        SUMMARIZE ( Data, Data[Price], Data[Qty] ),
        "minsequence",
        VAR thisprice = Data[Price]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + 1,
        "maxsequence",
        VAR thisprice = Data[Price]
        VAR thisqty = Data[Qty]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + thisqty
    )
VAR numberseries =
    ADDCOLUMNS (
        GENERATESERIES ( 1, numberofvalues, 1 ),
        "PriceValue",
        VAR thisnumber = [Value]
        RETURN
            MINX (
                FILTER (
                    sequencetable,
                    AND ( [minsequence] <= thisnumber, [maxsequence] >= thisnumber )
                ),
                Data[Price]
            )
    )
RETURN
    MEDIANX(numberseries, [PriceValue])


Std Dev = 
VAR numberofvalues =
    SUM ( Data[Qty] )
VAR sequencetable =
    ADDCOLUMNS (
        SUMMARIZE ( Data, Data[Price], Data[Qty] ),
        "minsequence",
        VAR thisprice = Data[Price]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + 1,
        "maxsequence",
        VAR thisprice = Data[Price]
        VAR thisqty = Data[Qty]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + thisqty
    )
VAR numberseries =
    ADDCOLUMNS (
        GENERATESERIES ( 1, numberofvalues, 1 ),
        "PriceValue",
        VAR thisnumber = [Value]
        RETURN
            MINX (
                FILTER (
                    sequencetable,
                    AND ( [minsequence] <= thisnumber, [maxsequence] >= thisnumber )
                ),
                Data[Price]
            )
    )
RETURN
    STDEVX.P(numberseries, [PriceValue])


Pearson's Skewness = 3*([Mean] - [Median])/[Std Dev]

Note: Mode might give an error if you have two prices that occur the same number of times. I focused on making the virtual table for Median and StdDev, and may edit this post later to give a more robust mode (ran out of time now). Hopefully, you can use the approach to make the virtual table with the repeated values to meet your needs.

If this works for you, please mark it as the solution. Kudos are appreciated too. Please let me know if not.

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

View solution in original post

v-kelly-msft · ‎07-23-2020

Hi @Anonymous ,

First create a calculated column as below:

Column = REPT(CONCATENATE("$"&'Table'[Price]," "),'Table'[Qty])

Then create a measure as below:

List for A = CONCATENATEX(FILTER(ALL('Table'),'Table'[Product]="A"),'Table'[Column]," ")

And you will see:

For details,pls attached.

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

mahoneypat · ‎07-23-2020

I would just add a couple steps to your existing query to make the extra rows, so your downstream analysis will be easy. Here is an example on how you can use List.Repeat and the Qty value to get the modified table from your example data. To see how it works, just create a blank query, go to Advanced Editor, and replace the text there with the M code below.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUVIx0LM0ANJGSrE6UBFDPQOQiCGyiCFCxAlVTSwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product = _t, Price = _t, Qty = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Product", type text}, {"Price", Currency.Type}, {"Qty", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each List.Repeat({[Price]}, [Qty])),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Custom",{{"Custom", Currency.Type}})
in
    #"Changed Type1"

If this works for you, please mark it as the solution. Kudos are appreciated too. Please let me know if not.

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

Anonymous · ‎07-24-2020

Hi @mahoneypat ,

While this solution works in principle, it's failing in application for my use-case. The problem I'm running into is the incredibly resource-inefficient processes going on behind the scenes in PowerQuery. Here is my M script:

let
    Source = Oracle.Database("DSPROD", [HierarchicalNavigation=true, Query="
select DIM_CUST_CURR_ID,
       dim_item_curr_id,
       UNIT_PRICE,
       sum(sell_qty) as PRMRY_SELL_QTY
  from FOO
where dim_ordr_dt_id >= TO_NUMBER(TO_CHAR(SYSDATE-31,'YYYYMMDD'))
having round(sum(sell_qty),0) > 0
GROUP BY DIM_CUST_CURR_ID,
         dim_item_curr_id,
         UNIT_PRICE
--FETCH FIRST 10 ROWS ONLY"]),
    #"Added Custom" = Table.AddColumn(Source, "Price", each List.Repeat({[UNIT_PRICE]}, Number.Round([PRMRY_SELL_QTY], 0))),
    #"Expanded Prices_1" = Table.ExpandListColumn(#"Added Custom", "Price"),
    #"Removed Other Columns" = Table.SelectColumns(#"Expanded Prices_1",{"DIM_CUST_CURR_ID", "DIM_ITEM_CURR_ID", "UNIT_PRICE"})
in
    #"Removed Other Columns"

When I download a 40-day transaction history, I begin with a dataframe that is 4 columns wide (customer account number, item number, price, qty) and ~1MM records long, and the corresponding csv (same query is kicked out by another SQL program for debuggin purposes) is ~25MB. After transforming accordingly, I have a dataframe that is 3 columns wide, 900MM+ rows long, and PowerQuery somehow consumes 42GB of cache on my hard drive.

I need to compare this 40-day history to the trailing 30-day sales history, which is trending an initial 4 columns x ~600K rows, transforms into 3 columns x ~650MM rows, and also consumes an additional ~30GB of cache on my hard drive.

The only way I was able to get my model to refresh is to manually refresh each table independently because my computer runs out of disk space. Moreover, I suspect this processing won't fly when refreshing with the online service. Fortunately, the end-result of the data increases the size of the .pbix file by less than 50MB, so I know the problem isn't the amount of data, but rather that PowerQuery isn't the right tool for the job. Sadly, I can only use Python or R to perform the manipulations if I use a personal gateway instead of the enterprise gateway 😞

mahoneypat · ‎07-25-2020

That is a lot of rows. I thought more about this and came up with a way in DAX to make your table virtually (it could also be adapted to make it physically in a DAX calculated table).

I know you'll probably stick with R or Python, but this was a fun challenge, so I worked on it further. Please see the measures below. Mean was straight forward, Mode a little trickier, but Median and StdDev are the ones that use the virtual table. Pearson's is then calculated from mean, median, and stddev (hopefully that approach works for you).

Mean =
SUMX ( Data, Data[Price] * Data[Qty] )
    / SUM ( Data[Qty] )

Mode =
TOPN ( 1, VALUES ( Data[Price] ), CALCULATE ( SUM ( Data[Qty] ) ) )

Median = 
VAR numberofvalues =
    SUM ( Data[Qty] )
VAR sequencetable =
    ADDCOLUMNS (
        SUMMARIZE ( Data, Data[Price], Data[Qty] ),
        "minsequence",
        VAR thisprice = Data[Price]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + 1,
        "maxsequence",
        VAR thisprice = Data[Price]
        VAR thisqty = Data[Qty]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + thisqty
    )
VAR numberseries =
    ADDCOLUMNS (
        GENERATESERIES ( 1, numberofvalues, 1 ),
        "PriceValue",
        VAR thisnumber = [Value]
        RETURN
            MINX (
                FILTER (
                    sequencetable,
                    AND ( [minsequence] <= thisnumber, [maxsequence] >= thisnumber )
                ),
                Data[Price]
            )
    )
RETURN
    MEDIANX(numberseries, [PriceValue])


Std Dev = 
VAR numberofvalues =
    SUM ( Data[Qty] )
VAR sequencetable =
    ADDCOLUMNS (
        SUMMARIZE ( Data, Data[Price], Data[Qty] ),
        "minsequence",
        VAR thisprice = Data[Price]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + 1,
        "maxsequence",
        VAR thisprice = Data[Price]
        VAR thisqty = Data[Qty]
        RETURN
            CALCULATE (
                SUM ( Data[Qty] ),
                ALL ( Data[Price], Data[Qty] ),
                Data[Price] < thisprice
            ) + thisqty
    )
VAR numberseries =
    ADDCOLUMNS (
        GENERATESERIES ( 1, numberofvalues, 1 ),
        "PriceValue",
        VAR thisnumber = [Value]
        RETURN
            MINX (
                FILTER (
                    sequencetable,
                    AND ( [minsequence] <= thisnumber, [maxsequence] >= thisnumber )
                ),
                Data[Price]
            )
    )
RETURN
    STDEVX.P(numberseries, [PriceValue])


Pearson's Skewness = 3*([Mean] - [Median])/[Std Dev]

Note: Mode might give an error if you have two prices that occur the same number of times. I focused on making the virtual table for Median and StdDev, and may edit this post later to give a more robust mode (ran out of time now). Hopefully, you can use the approach to make the virtual table with the repeated values to meet your needs.

If this works for you, please mark it as the solution. Kudos are appreciated too. Please let me know if not.

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

Convert weighted list to vector or list

Helpful resources

FabCon Global Hackathon

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Convert weighted list to vector or list

Helpful resources

FabCon Global Hackathon

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly