Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello
I followed Stephen Roughley's brilliant post:
https://stephenroughley.com/2022/02/28/simple-linear-regression-in-power-query/
and it almost worked for me. My problem is I have my xcol repeated in the dataset, because I have two more dimensions. And the forecast should be done for each pair of theese two dimensions. I guess the solution is in the "range" part of the code, but I can't figure out how.
Any tips please?
Solved! Go to Solution.
Here's an example similar to yours that demonstrates what I was suggesting:
let
Forecast = (sourceTable as table, xcol as text, ycol as text) =>
let
range = Table.SelectRows(sourceTable, each Record.Field(_, ycol) <> 0),
rowcount = Table.RowCount(range),
sumx = List.Sum(Table.Column(range, xcol)),
sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),
sumy = List.Sum(Table.Column(range, ycol)),
sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),
avgx = List.Average(Table.Column(range, xcol)),
avgy = List.Average(Table.Column(range, ycol)),
Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),
Intercept = avgy - Slope * avgx,
Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each
if Record.Field(_, ycol) = 0
then Intercept + Slope * Record.Field(_, xcol)
else 0, type number )
in
Result,
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZZBRCsAgDEPv0m8/pkbHziJ+bPc/xIw4XOhHCTxC0rY1ixbsHkNNxXrYKHGioEznKQhjDiHFkSrkWXUoglgHdbEuqwsuatel3y1VCDWqaZ4CQXBJmv0tfimiVkVzcUVwUT58/am/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dimA = _t, dimB = _t, dimX = _t, FactY = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"dimA", Int64.Type}, {"dimB", type text}, {"dimX", Int64.Type}, {"FactY", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"dimA", "dimB"}, {{"SubTable", each Forecast(_, "dimX", "FactY"), type table [dimA=nullable number, dimB=nullable text, dimX=nullable number, FactY=nullable number, FactYForecast=nullable number]}}),
#"Expanded SubTable" = Table.ExpandTableColumn(#"Grouped Rows", "SubTable", {"dimX", "FactY", "FactYForecast"}, {"dimX", "FactY", "FactYForecast"})
in
#"Expanded SubTable"
The highlighted column is the result:
Here's an example similar to yours that demonstrates what I was suggesting:
let
Forecast = (sourceTable as table, xcol as text, ycol as text) =>
let
range = Table.SelectRows(sourceTable, each Record.Field(_, ycol) <> 0),
rowcount = Table.RowCount(range),
sumx = List.Sum(Table.Column(range, xcol)),
sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),
sumy = List.Sum(Table.Column(range, ycol)),
sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),
avgx = List.Average(Table.Column(range, xcol)),
avgy = List.Average(Table.Column(range, ycol)),
Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),
Intercept = avgy - Slope * avgx,
Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each
if Record.Field(_, ycol) = 0
then Intercept + Slope * Record.Field(_, xcol)
else 0, type number )
in
Result,
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZZBRCsAgDEPv0m8/pkbHziJ+bPc/xIw4XOhHCTxC0rY1ixbsHkNNxXrYKHGioEznKQhjDiHFkSrkWXUoglgHdbEuqwsuatel3y1VCDWqaZ4CQXBJmv0tfimiVkVzcUVwUT58/am/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dimA = _t, dimB = _t, dimX = _t, FactY = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"dimA", Int64.Type}, {"dimB", type text}, {"dimX", Int64.Type}, {"FactY", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"dimA", "dimB"}, {{"SubTable", each Forecast(_, "dimX", "FactY"), type table [dimA=nullable number, dimB=nullable text, dimX=nullable number, FactY=nullable number, FactYForecast=nullable number]}}),
#"Expanded SubTable" = Table.ExpandTableColumn(#"Grouped Rows", "SubTable", {"dimX", "FactY", "FactYForecast"}, {"dimX", "FactY", "FactYForecast"})
in
#"Expanded SubTable"
The highlighted column is the result:
Thank you so much for the time.
This works, but it is way too complicated for me. I need to understand my code.
I have come to realize that the problem has nothing to do with the linear regression, but rather to figure out how I can reference the exact rows I need for the calculation. I will post a new question for that.
At a high level, this is fairly straightforward. I defined a function that adds a column to a table and then instead of applying that function to the whole data table at once, I applied it to a bunch of subtables (determined by the dimension grouping) separately. That's why I referenced my answer here. It's pretty much the same question except with a more complicated function than Table.AddIndexColumn.
You can group by your two dimensions and use the Forecast function as the aggregation function for each subtable.
This is similar to my answer here but instead of
each Table.AddIndexColumn(_, "Order", 1, 1)
you'd use the Forecast function and it would look something like
each Forecast(_, "Date Col Name", "X Col Name", "Y Col Name")
Thank you
I am not quite sure where to do the grouping.
The code for the fomula looks like this (copy of Stephens code):
let
Forecast = (sourceTable as table, xcol as text, ycol as text) =>
let
range = Table.SelectRows(sourceTable, each Record.Field(_, xcol) > 0),
rowcount = Table.RowCount(range),
sumx = List.Sum(Table.Column(range, xcol)),
sumx2 = List.Sum(List.Transform(Table.Column(range, xcol), each Number.Power(_, 2))),
sumy = List.Sum(Table.Column(range, ycol)),
sumxy = List.Sum(Table.TransformRows(range, each Record.Field(_, xcol) * Record.Field(_, ycol))),
avgx = List.Average(Table.Column(range, xcol)),
avgy = List.Average(Table.Column(range, ycol)),
Slope = ((rowcount * sumxy - sumx * sumy) / (rowcount * sumx2 - Number.Power(sumx, 2))),
Intercept = avgy - Slope * avgx,
Result = Table.AddColumn(sourceTable, Text.Combine({ycol, "Forecast"}), each if (Record.Field(_, xcol) <1) then Intercept + Slope * Record.Field(_, xcol) else Record.Field(_, ycol), type number )
in
Result
in
Forecast
If you provide some sample data, I can show you how to use that function.
That is very kind.
This is the same structure as my dataset:
dimA | dimB | dimX | FactY |
1 | a | 1 | 18 |
1 | a | 2 | 21 |
1 | a | 3 | 17 |
1 | b | 1 | 45 |
1 | b | 2 | 43 |
1 | b | 3 | 40 |
2 | a | 1 | 0 |
2 | a | 2 | 10 |
2 | a | 3 | 5 |
2 | b | 1 | 90 |
2 | b | 2 | 55 |
2 | b | 3 | 87 |
So the problem is, I need to first select only the subset of the inputtable (I do it all in the same table just like Stephen. My inputtable is therefore the previous step in the query) where dimA has the value of dimA in that row (same for dimB) and then an additional condition on the dimX (xcol) >0.
I tried something like:
range = Table.SelectRows(sourceTable, each Record.Field(_, xcol) > 0), and Record.Field(_, dimA) = [dimA] and Record.Field(_, dimB)=[dimB]) but that doesn't work.
The calculation of the forecast should be done for each pair of dimA, dimB and all their x's and y's
Like this subset:
Oh sorry. And there would be some rows with dimX being zero or negative (the future years) and no values for FactY
Check out the July 2025 Power BI update to learn about new features.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.