Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
pedroluccas
Frequent Visitor

LINEAR REGRESSION - X-AXIS IN TEXT FORMAT

Hello

I am working with a data sample that contains numeric values on "y" axis (pH results) and text values on "x" axis (sample/batch code) and need to build a linear regression analysis.

So far, I found a website "IterationInsights" that published an article "How to do simple linear regression in Power BI", 

which really helped me to analyse using date, but not sample code (since it is a text). I need to evaluate the trend between 2 samples, independent of when they were collected.

Note that if x is a Date, we have: y = 0.0023x - 99.065, r² = 0.2566

But if x is the sample, we have: y = 0.0297x + 5.5252, r² = 0.3024

pedroluccas_0-1664991735357.png

How am I able to build a variable table that converts the sample code to a numeric sequence (A1AA/22 = 1, A2AA/22 = 2, ...., A21AA/22 = 21)?
Besides, it's possible to filter this table? Ex: analyse A12AA/22 to A20AA/22 (interval which would automatically be converted to the numbers: A12AA/22 = 1, A13AA/22 = 2, ....,  A20AA/22 = 9, always starting with 1).

 

Since slope formula is: [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²],

x should be refeered as a sequence of natural numbers representing the samples.

n is the distinct count of the samples.

 

I suppose that I should build a data var table to convert sample code to a number, and use All selected function to filter the samples I'm working with, but I'm really having a hard time to combine these functions and obtain a functional linear regression equation.

I exemplified the solpe formula because I think the same solution would be applied to intercep, r², etc.

 

For slope, I used the formula:

---------------------------------------------------

SLOPE =

 

var data = SELECTCOLUMNS(ALLSELECTED(DATA),

                        "x_values",DATA[DATE],

                        "y_values",DATA[RESULTS]

)

 

--Variables to Solve for:

var y_sum = SUMX(data,[y_values])

var x_sum = SUMX(data,[x_values])

var x2 = SUMX(data,[x_values]^2)

var xy = SUMX(data,[x_values]*[y_values])

var row_count = COUNTROWS(data)

 

--Solve for formula

 

var Slope = ((row_count * xy) - (x_sum * y_sum))/((row_count*x2)-((x_sum)^2))

 

return Slope

-----------------------------------------------------

 

Anyone could help me?

2 REPLIES 2
Greg_Deckler
Super User
Super User

@pedroluccas So, duplicate your query in Power Query Editor. Remove all rows except the SAMPLE CODE column. Do a Remove Duplicates. Create an Index column (Add Columns | Index). You can then either do a merge to get your index or in a DAX calculated column, add a column to your original table that grabs the Index using RELATED.

 

You can find sample code here:

Correlation coefficient - Microsoft Power BI Community

If you have seasonality in your data:

De-Seasonalized Correlation Coefficient - Microsoft Power BI Community


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Thank you very much!

It worked for Slope and R². But now seems that I have other problem.

When I have a huge amount of samples of different products: A1AA/22 (Product A), B1AA/22 (Product B), C1AA/22 (Product C), etc, it generates a index with huge numbers (10,000 or more).

Other scenario is when I wnat to use just an interval of the samples (sample 5 - 12).

Since Intercept formula is b = [(Σy)*(Σx²) - (Σx) * (Σxy)] / [n * (Σx²) - (Σx)²], and now appears the factor (Σx²) on the numerator, huge numbers seems to affect the calculation.

 

See below the calculation and graphs

 

pedroluccas_0-1665057765038.png

pedroluccas_1-1665057840395.png

 

What can I do to make the index change according with my selection and restart the count with the number 1?

 

 

 

 

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors