- Microsoft Power BI Community
- Welcome to the Community!
- News & Announcements
- Get Help with Power BI
- Desktop
- Service
- Report Server
- Power Query
- Mobile Apps
- Developer
- DAX Commands and Tips
- Custom Visuals Development Discussion
- Health and Life Sciences
- Power BI Spanish Community
- Translated Spanish Desktop
- Power Platform Integration - Better Together!
- Power Platform Integrations
- Power Platform and Dynamics 365 Integrations
- Training and Consulting
- Instructor Led Training
- Galleries
- Community Connections & How-To Videos
- COVID-19 Data Stories Gallery
- Themes Gallery
- Data Stories Gallery
- R Script Showcase
- Webinars and Video Gallery
- Quick Measures Gallery
- 2021 MSBizAppsSummit Gallery
- 2020 MSBizAppsSummit Gallery
- 2019 MSBizAppsSummit Gallery
- Events
- Ideas
- Custom Visuals Ideas
- Issues
- Issues
- Events
- Upcoming Events
- Community Engagement
- T-Shirt Design Challenge 2023
- Community Blog
- Power BI Community Blog
- Custom Visuals Community Blog
- Community Support
- Community Accounts & Registration
- Using the Community
- Community Feedback

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Microsoft Power BI Community
- Galleries
- Quick Measures Gallery
- Simple Linear Regression

09-07-2017 06:17 AM - last edited 04-12-2018 23:19 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Simple Linear Regression

09-07-2017
06:17 AM

This measure allows you to predict dependent values Y from independent values X.

Simple linear regression

Estimate Y values based on X values.

Name: Category

Tooltip: The category for which you have known X and Y values

Type: Categorical field

Name: Measure X

Tooltip: Known X (independent) values

Type: Numerical field / measure

Name: Measure Y

Tooltip: Known Y (dependent) values

Type: Numerical field / measure

```
Estimated {Measure Y} =
VAR Known =
FILTER (
SELECTCOLUMNS (
ALLSELECTED ( {Category} ),
"Known[X]", CALCULATE ( {Measure X} ),
"Known[Y]", CALCULATE ( {Measure Y} )
),
AND (
NOT ( ISBLANK ( Known[X] ) ),
NOT ( ISBLANK ( Known[Y] ) )
)
)
VAR Count_Items =
COUNTROWS ( Known )
VAR Sum_X =
SUMX ( Known, Known[X] )
VAR Sum_X2 =
SUMX ( Known, Known[X] ^ 2 )
VAR Sum_Y =
SUMX ( Known, Known[Y] )
VAR Sum_XY =
SUMX ( Known, Known[X] * Known[Y] )
VAR Average_X =
AVERAGEX ( Known, Known[X] )
VAR Average_Y =
AVERAGEX ( Known, Known[Y] )
VAR Slope =
DIVIDE (
Count_Items * Sum_XY - Sum_X * Sum_Y,
Count_Items * Sum_X2 - Sum_X ^ 2
)
VAR Intercept =
Average_Y - Slope * Average_X
RETURN
Intercept + Slope * {Measure X}
```

For more details on and other uses of this quick measure, see my blog post on the subject:

https://xxlbi.com/blog/simple-linear-regression-in-dax/

eyJrIjoiZWNiNTQ2MGEtMjhlNS00YTNhLWE2NTktZDg3MzcxMjQ0NDc4IiwidCI6ImQzMmNkYzNmLTY1NTUtNGNhYy1iYjFhLTg2OWZiMTE0MzRlNSJ9

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

08-10-2023
08:54 PM

Hi Daniil,

Thanks a lot for your code, it actually helped me a lot to understand way better DAX syntax and measures. I have a comment, more related to the logic and statistical part of the code than to the coding perse. I believe that 9th line of code, instead of having an AND you should have an OR. My logic is the following, imagine you have only four data points for variables X and Y:

X Y

0 2

3 1

4 0

0 0

According to your code, "AND (NOT ( ISBLANK ( Known[X] ) ),NOT ( ISBLANK ( Known[Y] ) ))" you are filtering and taking only the data points that are not zero on X AND are not zero on Y. That, in my example, would exclude from the calculation all but the second data point. I believe this is a statistical error, since lines one and three from the example provide information that you should take into account when making linear regression. On the other hand, if you change that line for "OR(NOT ( ISBLANK ( Known[X] ) ),NOT ( ISBLANK ( Known[Y] ) ))", you are filtering and taking the data points that are not zero on X OR not zero on Y. In the example provided, that, takes out only the fourth data point which doesn't provide any statistically significant information, at least from my perspective. Would you mind clearing this out for me?

Regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-06-2021
02:46 PM

I've dabbled in multiple linear regression. Here's what it looks like with three x variables.

```
Regression Coefficients =
VAR ShortNames =
SELECTCOLUMNS (
Returns,
"A", [Equity], /*Known X1 values*/
"D", [Duration], /*Known X2 values*/
"C", [Credit], /*Known X3 values*/
"Y", [Manager] /*Known Y values*/
)
VAR n = COUNTROWS ( ShortNames )
VAR A = SUMX ( ShortNames, [A] )
VAR D = SUMX ( ShortNames, [D] )
VAR C = SUMX ( ShortNames, [C] )
VAR Y = SUMX ( ShortNames, [Y] )
VAR AA = SUMX ( ShortNames, [A] * [A] ) - A * A / n
VAR DD = SUMX ( ShortNames, [D] * [D] ) - D * D / n
VAR CC = SUMX ( ShortNames, [C] * [C] ) - C * C / n
VAR AD = SUMX ( ShortNames, [A] * [D] ) - A * D / n
VAR AC = SUMX ( ShortNames, [A] * [C] ) - A * C / n
VAR DC = SUMX ( ShortNames, [D] * [C] ) - D * C / n
VAR AY = SUMX ( ShortNames, [A] * [Y] ) - A * Y / n
VAR DY = SUMX ( ShortNames, [D] * [Y] ) - D * Y / n
VAR CY = SUMX ( ShortNames, [C] * [Y] ) - C * Y / n
VAR BetaA =
DIVIDE (
AY*DC*DC - AD*CY*DC - AY*CC*DD + AC*CY*DD + AD*CC*DY - AC*DC*DY,
AD*CC*AD - AC*DC*AD - AD*AC*DC + AA*DC*DC + AC*AC*DD - AA*CC*DD
)
VAR BetaD =
DIVIDE (
AY*CC*AD - AC*CY*AD - AY*AC*DC + AA*CY*DC + AC*AC*DY - AA*CC*DY,
AD*CC*AD - AC*DC*AD - AD*AC*DC + AA*DC*DC + AC*AC*DD - AA*CC*DD
)
VAR BetaC =
DIVIDE (
- AY*DC*AD + AD*CY*AD + AY*AC*DD - AA*CY*DD - AD*AC*DY + AA*DC*DY,
AD*CC*AD - AC*DC*AD - AD*AC*DC + AA*DC*DC + AC*AC*DD - AA*CC*DD
)
VAR Intercept =
AVERAGEX ( ShortNames, [Y] )
- AVERAGEX ( ShortNames, [A] ) * BetaA
- AVERAGEX ( ShortNames, [D] ) * BetaD
- AVERAGEX ( ShortNames, [C] ) * BetaC
RETURN
{ BetaA, BetaD, BetaC, Intercept } /*Pick One*/
```

(From my StackOverflow post.)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

02-12-2020
04:15 AM

Hi @Daniil,

I'm trying to find the slope of a graph comparing 2 pollution datasets. I altered your measure to do this. The measure works well until it comes to summing the X^2 and XY values. The multiplication doesn't work and this affects the slope calculation.

What could be causing this problem?

PS: I'm very inexperienced with Power BI, so forgive me if this is a stupid question related to SUM and SUMX.

Anonymous

Not applicable

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-01-2017
04:48 AM

Really excellent stuff !!

I had managed to do a linear regression line previously but this is so much cleaner

I did have to tweak yours slightly

1) Instead of a plain old Allselected i needed to

do

CALCULATETABLE (

SUMMARIZE (

FeederHistory,

FeederHistory[EventDateHour],

FeederHistory[EventDate]

),

ALLSELECTED ( FeederHistory )

),

and for some reason SSRS moaned at me for using SELCTEDVALUE so I replaced it with the older hasonevalue values paradigm

ie

RETURN

Intercept

+ Slope

* IF (

HASONEVALUE ( FeederHistory[EventDate] ),

VALUES ( FeederHistory[EventDate] ))

Anonymous

Not applicable

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-01-2017
04:45 AM

Really excellent stuff !!

I had managed to do a linear regression line previously but this is so much cleaner

I did have to tweak yours slightly

1) Instead of a plain old Allselected i needed to

do

CALCULATETABLE (

SUMMARIZE (

FeederHistory,

FeederHistory[EventDateHour],

FeederHistory[EventDate]

),

ALLSELECTED ( FeederHistory )

),

and for some reason SSRS moaned at me for using SELCTEDVALUE so I replaced it with the older hasonevalue values paradigm

ie

RETURN

Intercept

+ Slope

* IF (

HASONEVALUE ( FeederHistory[EventDate] ),

VALUES ( FeederHistory[EventDate] ))

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-01-2017
09:04 AM

Hi

Thanks for your comments. If, like me you ever need to get the first and last values for a given filtered interval to calculate the total trend change over time, or rate of trend change over time, the Starting and Ending measures are great. When you get something like this working it really puts a smile on your face! With these complicated measures I can highly recomment DAX formatter if you are not already using it.

Tony

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-01-2017
04:51 AM

I know this is a Power Bi portal but I had to make it work for both PBI and SSRS

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

11-27-2017
05:16 AM

Hi

I am using your Simple Linear Regression measure with great success. Thank You.

I am doing temperature studies and I want to find the first and last values of your measure in a given time interval which is filtered by a slicer. By doing this I can calculate the difference in the trend line over the interval of filtered years.

I can find the first and last **year** of the filtered interval by using FIRSTNONBLANK and LASTNONBLANK but I can't do this with your code because FIRSTNONBLANK and LASTNONBLANK only accept a column as an argument. I cannot seem to convert your measure into a calculated column to do this. Is there a version of your measure that produces a column? My data has two columns **year** and **temperature. **X and Y.

Any help would be much appreciated.

Tony Maclaren

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

11-28-2017
12:30 AM

Thanks for your feedback, Tony!

If I understand you correctly, there are a few ways to achieve your goal.

Here is an example of a measure:

Starting Temperature = VAR Known = FILTER ( SELECTCOLUMNS ( ALLSELECTED ( 'Table'[Year] ), "Known[X]", 'Table'[Year] ), "Known[Y]", [Temperature] ), AND ( NOT ( ISBLANK ( Known[X] ) ), NOT ( ISBLANK ( Known[Y] ) ) ) ) VAR First = TOPN ( 1, Known, Known[X], ASC ) RETURN MINX ( First, Known[Y] )

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

11-28-2017
11:35 AM

Dear Daniil

Thank you for your very prompt response. I will be in Sydney, where I grew up, in early December!

Here is the tweaked code below (minus some brackets) that does the trick.Thank you so much.

This is an** incredibly useful** companion measure to your Simple Linear Regression measure, and because it works with measures it will find the equivalent of FIRSTNONBLANK in **a measure**. However it does not work correctly in the filtered context of a YEAR slider to filter the interval of years which the Simple Linear Regression measure does .The value changes with the slider, but the results do not correctly match the estimated starting value of the **Estimated** measure except for the first value of the whole dataset.

Starting Temperature =

VAR Known =

FILTER (

SELECTCOLUMNS (

ALLSELECTED ( 'cetdata'[YEAR] ),

"Known[X]", 'cetdata'[YEAR],

"Known[Y]", [Estimated]

),

AND (

NOT ( ISBLANK ( Known[X] ) ),

NOT ( ISBLANK ( Known[Y] ) )

)

)

VAR First =

TOPN ( 1, Known, Known[X], ASC )

RETURN

MINX ( First, Known[Y] )

NOT ( ISBLANK ( Known[Y] ) )

)

)

VAR First =

TOPN ( 1, Known, Known[X], ASC )

RETURN

MINX ( First, Known[Y] )

Now we have to fix the filter context and figure out how to find [**Ending temperature**] so we can subtract **[Starting Temperature]** and get the trend change per filtered interval.

Thanks again--great stuff. No one else had a clue what my problem was or how to solve it.

Tony

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

11-28-2017
07:24 PM

Tony, you should visit the local Power BI User Group if you get a chance 🙂 The next meeting date should be announced at Meetup soon.

Try the following measures:

Starting Temperature = VAR Estimate = SELECTCOLUMNS ( KEEPFILTERS ( ALLSELECTED ( 'cetdata'[YEAR] ) ), "Estimate[X]", 'cetdata'[YEAR], "Estimate[Y]", [Estimated] ) VAR First = TOPN ( 1, Estimate, Estimate[X], ASC ) RETURN MINX ( First, Estimate[Y] )

Ending Temperature = VAR Estimate = SELECTCOLUMNS ( KEEPFILTERS ( ALLSELECTED ( 'cetdata'[YEAR] ) ), "Estimate[X]", 'cetdata'[YEAR], "Estimate[Y]", [Estimated] ) VAR Last = TOPN ( 1, Estimate, Estimate[X], DESC ) RETURN MAXX ( Last, Estimate[Y] )

Temperature Difference = [Ending Temperature] - [Starting Temperature]

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

11-29-2017
05:43 AM

Dear Daniil

Thank you very much--all working perfectly now. KEEPFILTERS did the trick.I have really learned a lot! I have now ordered the SQLBI book.

One strange thing however,The [Estimated] measure which is your Simple linear Regression measure produces a total if displayed in a table. I do not quite understand this.

Best Regards

and thanks again

Tony

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-01-2017
07:56 PM

@tonymaclaren, it is likely that at the grand total level what you see is not a total, but the Intercept. This is because the following expression evaluates to Intercept:

Intercept + Slope * SELECTEDVALUE ( 'cetdata'[YEAR] )

At the grant total level, there is usually more than one 'cetdata'[YEAR], hence SELECTEDVALUE ( 'cetdata'[YEAR] ) returns BLANK, turning the multiplication into BLANK as well, leaving only Intercept.

I updated my blog post to deal with this kind of situations.

@Anonymous, glad you found this useful!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-02-2017
02:26 AM

Dar Daniil

Thanks for the explanation.The same value also appears at the top of the [Estimated] measure, but only if the whole of the data for ('Ctedata'[YEAR])is displayed. is there a modification to eliminate this?

Tony

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-02-2017
03:20 AM

Tony, it's a bit difficult to advise anything without seeing your report -- can you share a sample of your data?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-02-2017
04:27 AM

Hi Daniil

Sure, no problem.What would you like me to supply and how.The full dataset is for 391 years.

Tony

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-02-2017
05:34 AM

Wow, 391 years of data sounds exciting 🙂 If you could reduce your data to the minimum amount with which you could reproduce your formula problems, that would be great.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-02-2017
07:18 AM

Thanks Daniil

I have attached a subset of the file that exhibits the exact same problem using the following measure:

Estimated tony Data =

VAR Known =

FILTER (

SELECTCOLUMNS (

ALLSELECTED ( 'Tony_data'[YEAR] ),

"Known[X]", 'Tony_data'[YEAR],

"Known[Y]", [Averagetemp]

),

AND (

NOT ( ISBLANK ( Known[X] ) ),

NOT ( ISBLANK ( Known[Y] ) )

)

)

VAR Count_Items =

COUNTROWS ( Known )

VAR Sum_X =

SUMX ( Known, Known[X] )

VAR Sum_X2 =

SUMX ( Known, Known[X] ^ 2 )

VAR Sum_Y =

SUMX ( Known, Known[Y] )

VAR Sum_XY =

SUMX ( Known, Known[X] * Known[Y] )

VAR Average_X =

AVERAGEX ( Known, Known[X] )

VAR Average_Y =

AVERAGEX ( Known, Known[Y] )

VAR Slope =

DIVIDE (

Count_Items * Sum_XY - Sum_X * Sum_Y,

Count_Items * Sum_X2 - Sum_X ^ 2

)

VAR Intercept =

Average_Y - Slope * Average_X

RETURN

Intercept + Slope * SELECTEDVALUE ( 'Tony_data'[YEAR])

Averagetemp = SUM(Tony_data[AverageTemperature])

Tony

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

12-02-2017
01:56 PM

Tony, I can't reproduce the problem, unfortunately. Can you please attach a pbix file and screenshots with the problem highlighted?

Also, have you followed my updated blog post? I don't use SELECTEDVALUE anymore: https://xxlbi.com/blog/simple-linear-regression-in-dax/

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

05-08-2018
07:58 AM

Thanks for the measure Daniil, it is extremely helpful.

I am attempting to apply it to a forecasting problem that is relatively simple in excel, but is causing some trouble for me with Dax/PowerBI. I have 19 different categories that I need to run the regression on, and then I need to sum those categories for use on further regressions. Could your measure be modified to accomplish this or do I need to create 19 different measures?

Thanks for any help you could provide.