Solved: Re: Linear regression in Power Query with grouping

Anonymous · ‎07-29-2022

Hello

I followed Stephen Roughley's brilliant post:

https://stephenroughley.com/2022/02/28/simple-linear-regression-in-power-query/

and it almost worked for me. My problem is I have my xcol repeated in the dataset, because I have two more dimensions. And the forecast should be done for each pair of theese two dimensions. I guess the solution is in the "range" part of the code, but I can't figure out how.

Any tips please?

AlexisOlson · ‎07-29-2022

Here's an example similar to yours that demonstrates what I was suggesting:

let
    Forecast = (sourceTable as table, xcol as text, ycol as text) =>
    let
        range = Table.SelectRows(sourceTable, each Record.Field(_, ycol) <> 0),
        rowcount = Table.RowCount(range),
        sumx  = List.Sum(Table.Column(range, xcol)),
        sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),
        sumy  = List.Sum(Table.Column(range, ycol)),
        sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),
        avgx  = List.Average(Table.Column(range, xcol)),
        avgy  = List.Average(Table.Column(range, ycol)),
        Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),
        Intercept = avgy - Slope * avgx,
        Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each
            if Record.Field(_, ycol) = 0
            then Intercept + Slope * Record.Field(_, xcol)
            else 0, type number )
    in
        Result,
        
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZZBRCsAgDEPv0m8/pkbHziJ+bPc/xIw4XOhHCTxC0rY1ixbsHkNNxXrYKHGioEznKQhjDiHFkSrkWXUoglgHdbEuqwsuatel3y1VCDWqaZ4CQXBJmv0tfimiVkVzcUVwUT58/am/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dimA = _t, dimB = _t, dimX = _t, FactY = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"dimA", Int64.Type}, {"dimB", type text}, {"dimX", Int64.Type}, {"FactY", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"dimA", "dimB"}, {{"SubTable", each Forecast(_, "dimX", "FactY"), type table [dimA=nullable number, dimB=nullable text, dimX=nullable number, FactY=nullable number, FactYForecast=nullable number]}}),
    #"Expanded SubTable" = Table.ExpandTableColumn(#"Grouped Rows", "SubTable", {"dimX", "FactY", "FactYForecast"}, {"dimX", "FactY", "FactYForecast"})
in
    #"Expanded SubTable"

The highlighted column is the result:

View solution in original post

AlexisOlson · ‎07-29-2022

Here's an example similar to yours that demonstrates what I was suggesting:

let
    Forecast = (sourceTable as table, xcol as text, ycol as text) =>
    let
        range = Table.SelectRows(sourceTable, each Record.Field(_, ycol) <> 0),
        rowcount = Table.RowCount(range),
        sumx  = List.Sum(Table.Column(range, xcol)),
        sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),
        sumy  = List.Sum(Table.Column(range, ycol)),
        sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),
        avgx  = List.Average(Table.Column(range, xcol)),
        avgy  = List.Average(Table.Column(range, ycol)),
        Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),
        Intercept = avgy - Slope * avgx,
        Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each
            if Record.Field(_, ycol) = 0
            then Intercept + Slope * Record.Field(_, xcol)
            else 0, type number )
    in
        Result,
        
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZZBRCsAgDEPv0m8/pkbHziJ+bPc/xIw4XOhHCTxC0rY1ixbsHkNNxXrYKHGioEznKQhjDiHFkSrkWXUoglgHdbEuqwsuatel3y1VCDWqaZ4CQXBJmv0tfimiVkVzcUVwUT58/am/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dimA = _t, dimB = _t, dimX = _t, FactY = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"dimA", Int64.Type}, {"dimB", type text}, {"dimX", Int64.Type}, {"FactY", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"dimA", "dimB"}, {{"SubTable", each Forecast(_, "dimX", "FactY"), type table [dimA=nullable number, dimB=nullable text, dimX=nullable number, FactY=nullable number, FactYForecast=nullable number]}}),
    #"Expanded SubTable" = Table.ExpandTableColumn(#"Grouped Rows", "SubTable", {"dimX", "FactY", "FactYForecast"}, {"dimX", "FactY", "FactYForecast"})
in
    #"Expanded SubTable"

The highlighted column is the result:

Anonymous · ‎07-29-2022

Thank you so much for the time.
This works, but it is way too complicated for me. I need to understand my code.

I have come to realize that the problem has nothing to do with the linear regression, but rather to figure out how I can reference the exact rows I need for the calculation. I will post a new question for that.

AlexisOlson · ‎07-30-2022

At a high level, this is fairly straightforward. I defined a function that adds a column to a table and then instead of applying that function to the whole data table at once, I applied it to a bunch of subtables (determined by the dimension grouping) separately. That's why I referenced my answer here. It's pretty much the same question except with a more complicated function than Table.AddIndexColumn.

AlexisOlson · ‎07-29-2022

You can group by your two dimensions and use the Forecast function as the aggregation function for each subtable.

This is similar to my answer here but instead of

each Table.AddIndexColumn(_, "Order", 1, 1)

you'd use the Forecast function and it would look something like

each Forecast(_, "Date Col Name", "X Col Name", "Y Col Name")

Anonymous · ‎07-29-2022

Thank you

I am not quite sure where to do the grouping.
The code for the fomula looks like this (copy of Stephens code):

let

Forecast = (sourceTable as table, xcol as text, ycol as text) =>

let

range = Table.SelectRows(sourceTable, each Record.Field(_, xcol) > 0),

rowcount = Table.RowCount(range),

sumx = List.Sum(Table.Column(range, xcol)),

sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),

sumy = List.Sum(Table.Column(range, ycol)),

sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),

avgx = List.Average(Table.Column(range, xcol)),

avgy = List.Average(Table.Column(range, ycol)),

Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),

Intercept = avgy - Slope * avgx,

Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each if (Record.Field(_, xcol) <1) then Intercept + Slope * Record.Field(_, xcol) else Record.Field(_, ycol), type number )

in

Result

in

Forecast

AlexisOlson · ‎07-29-2022

If you provide some sample data, I can show you how to use that function.

Anonymous · ‎07-29-2022

That is very kind.
This is the same structure as my dataset:

dimA	dimB	dimX	FactY
1	a	1	18
1	a	2	21
1	a	3	17
1	b	1	45
1	b	2	43
1	b	3	40
2	a	1	0
2	a	2	10
2	a	3	5
2	b	1	90
2	b	2	55
2	b	3	87

So the problem is, I need to first select only the subset of the inputtable (I do it all in the same table just like Stephen. My inputtable is therefore the previous step in the query) where dimA has the value of dimA in that row (same for dimB) and then an additional condition on the dimX (xcol) >0.

I tried something like:
range = Table.SelectRows(sourceTable, each Record.Field(_, xcol) > 0), and Record.Field(_, dimA) = [dimA] and Record.Field(_, dimB)=[dimB]) but that doesn't work.

Anonymous · ‎07-29-2022

The calculation of the forecast should be done for each pair of dimA, dimB and all their x's and y's

Like this subset:

Anonymous · ‎07-29-2022

Oh sorry. And there would be some rows with dimX being zero or negative (the future years) and no values for FactY

Linear regression in Power Query with grouping

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Linear regression in Power Query with grouping

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026