Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Anonymous
Not applicable

Linear regression in Power Query with grouping

Hello

I followed Stephen Roughley's brilliant post:

https://stephenroughley.com/2022/02/28/simple-linear-regression-in-power-query/

and it almost worked for me. My problem is I have my xcol repeated in the dataset, because I have two more dimensions. And the forecast should be done for each pair of theese two dimensions. I guess the solution is in the "range" part of the code, but I can't figure out how.

Any tips please?

1 ACCEPTED SOLUTION
AlexisOlson
Super User
Super User

Here's an example similar to yours that demonstrates what I was suggesting:

let
    Forecast = (sourceTable as table, xcol as text, ycol as text) =>
    let
        range = Table.SelectRows(sourceTable, each Record.Field(_, ycol) <> 0),
        rowcount = Table.RowCount(range),
        sumx  = List.Sum(Table.Column(range, xcol)),
        sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),
        sumy  = List.Sum(Table.Column(range, ycol)),
        sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),
        avgx  = List.Average(Table.Column(range, xcol)),
        avgy  = List.Average(Table.Column(range, ycol)),
        Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),
        Intercept = avgy - Slope * avgx,
        Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each
            if Record.Field(_, ycol) = 0
            then Intercept + Slope * Record.Field(_, xcol)
            else 0, type number )
    in
        Result,
        
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZZBRCsAgDEPv0m8/pkbHziJ+bPc/xIw4XOhHCTxC0rY1ixbsHkNNxXrYKHGioEznKQhjDiHFkSrkWXUoglgHdbEuqwsuatel3y1VCDWqaZ4CQXBJmv0tfimiVkVzcUVwUT58/am/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dimA = _t, dimB = _t, dimX = _t, FactY = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"dimA", Int64.Type}, {"dimB", type text}, {"dimX", Int64.Type}, {"FactY", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"dimA", "dimB"}, {{"SubTable", each Forecast(_, "dimX", "FactY"), type table [dimA=nullable number, dimB=nullable text, dimX=nullable number, FactY=nullable number, FactYForecast=nullable number]}}),
    #"Expanded SubTable" = Table.ExpandTableColumn(#"Grouped Rows", "SubTable", {"dimX", "FactY", "FactYForecast"}, {"dimX", "FactY", "FactYForecast"})
in
    #"Expanded SubTable"

 

The highlighted column is the result:

AlexisOlson_1-1659131647702.png

View solution in original post

9 REPLIES 9
AlexisOlson
Super User
Super User

Here's an example similar to yours that demonstrates what I was suggesting:

let
    Forecast = (sourceTable as table, xcol as text, ycol as text) =>
    let
        range = Table.SelectRows(sourceTable, each Record.Field(_, ycol) <> 0),
        rowcount = Table.RowCount(range),
        sumx  = List.Sum(Table.Column(range, xcol)),
        sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),
        sumy  = List.Sum(Table.Column(range, ycol)),
        sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),
        avgx  = List.Average(Table.Column(range, xcol)),
        avgy  = List.Average(Table.Column(range, ycol)),
        Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),
        Intercept = avgy - Slope * avgx,
        Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each
            if Record.Field(_, ycol) = 0
            then Intercept + Slope * Record.Field(_, xcol)
            else 0, type number )
    in
        Result,
        
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZZBRCsAgDEPv0m8/pkbHziJ+bPc/xIw4XOhHCTxC0rY1ixbsHkNNxXrYKHGioEznKQhjDiHFkSrkWXUoglgHdbEuqwsuatel3y1VCDWqaZ4CQXBJmv0tfimiVkVzcUVwUT58/am/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dimA = _t, dimB = _t, dimX = _t, FactY = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"dimA", Int64.Type}, {"dimB", type text}, {"dimX", Int64.Type}, {"FactY", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"dimA", "dimB"}, {{"SubTable", each Forecast(_, "dimX", "FactY"), type table [dimA=nullable number, dimB=nullable text, dimX=nullable number, FactY=nullable number, FactYForecast=nullable number]}}),
    #"Expanded SubTable" = Table.ExpandTableColumn(#"Grouped Rows", "SubTable", {"dimX", "FactY", "FactYForecast"}, {"dimX", "FactY", "FactYForecast"})
in
    #"Expanded SubTable"

 

The highlighted column is the result:

AlexisOlson_1-1659131647702.png

Anonymous
Not applicable

Thank you so much for the time. 
This works, but it is way too complicated for me. I need to understand my code. 

 

I have come to realize that the problem has nothing to do with the linear regression, but rather to figure out how I can reference the exact rows I need for the calculation. I will post a new question for that. 

At a high level, this is fairly straightforward. I defined a function that adds a column to a table and then instead of applying that function to the whole data table at once, I applied it to a bunch of subtables (determined by the dimension grouping) separately. That's why I referenced my answer here. It's pretty much the same question except with a more complicated function than Table.AddIndexColumn.

AlexisOlson
Super User
Super User

You can group by your two dimensions and use the Forecast function as the aggregation function for each subtable.

 

This is similar to my answer here but instead of

each Table.AddIndexColumn(_, "Order", 1, 1)

you'd use the Forecast function and it would look something like

each Forecast(_, "Date Col Name", "X Col Name", "Y Col Name")
Anonymous
Not applicable

Thank you

I am not quite sure where to do the grouping. 
The code for the fomula looks like this (copy of Stephens code):

let

    Forecast = (sourceTable as table, xcol as text, ycol as text) =>

        let

            range = Table.SelectRows(sourceTable, each Record.Field(_, xcol) > 0),

            rowcount = Table.RowCount(range),

            sumx = List.Sum(Table.Column(range, xcol)),

            sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),

            sumy = List.Sum(Table.Column(range, ycol)),

            sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),

            avgx = List.Average(Table.Column(range, xcol)),

            avgy = List.Average(Table.Column(range, ycol)),

            Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),

            Intercept = avgy - Slope * avgx,

            Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each if (Record.Field(_, xcol) <1) then Intercept + Slope * Record.Field(_, xcol) else Record.Field(_, ycol), type number )

        in

            Result

in

    Forecast

If you provide some sample data, I can show you how to use that function.

Anonymous
Not applicable

That is very kind. 
This is the same structure as my dataset:

 

dimAdimBdimXFactY
1a118
1a221
1a317
1b145
1b243
1b340
2a10
2a210
2a35
2b190
2b255
2b387

 

So the problem is, I need to first select only the subset of the inputtable (I do it all in the same table just like Stephen. My inputtable is therefore the previous step in the query) where dimA has the value of dimA in that row (same for dimB) and then an additional condition on the dimX (xcol) >0.

 

I tried something like:
      range = Table.SelectRows(sourceTable, each Record.Field(_, xcol) > 0), and Record.Field(_, dimA) = [dimA] and Record.Field(_, dimB)=[dimB]) but that doesn't work. 

Anonymous
Not applicable

The calculation of the forecast should be done for each pair of dimA, dimB and all their x's and y's

 

Like this subset:

bergholten_0-1659128166242.png

 

Anonymous
Not applicable

Oh sorry. And there would be some rows with dimX being zero or negative (the future years) and no values for FactY

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors