Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
pedroluccas
Frequent Visitor

LINEAR REGRESSION - X-AXIS IN TEXT FORMAT

Hello

I am working with a data sample that contains numeric values on "y" axis (pH results) and text values on "x" axis (sample/batch code) and need to build a linear regression analysis.

So far, I found a website "IterationInsights" that published an article "How to do simple linear regression in Power BI", 

which really helped me to analyse using date, but not sample code (since it is a text). I need to evaluate the trend between 2 samples, independent of when they were collected.

Note that if x is a Date, we have: y = 0.0023x - 99.065, r² = 0.2566

But if x is the sample, we have: y = 0.0297x + 5.5252, r² = 0.3024

pedroluccas_0-1664991735357.png

How am I able to build a variable table that converts the sample code to a numeric sequence (A1AA/22 = 1, A2AA/22 = 2, ...., A21AA/22 = 21)?
Besides, it's possible to filter this table? Ex: analyse A12AA/22 to A20AA/22 (interval which would automatically be converted to the numbers: A12AA/22 = 1, A13AA/22 = 2, ....,  A20AA/22 = 9, always starting with 1).

 

Since slope formula is: [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²],

x should be refeered as a sequence of natural numbers representing the samples.

n is the distinct count of the samples.

 

I suppose that I should build a data var table to convert sample code to a number, and use All selected function to filter the samples I'm working with, but I'm really having a hard time to combine these functions and obtain a functional linear regression equation.

I exemplified the solpe formula because I think the same solution would be applied to intercep, r², etc.

 

For slope, I used the formula:

---------------------------------------------------

SLOPE =

 

var data = SELECTCOLUMNS(ALLSELECTED(DATA),

                        "x_values",DATA[DATE],

                        "y_values",DATA[RESULTS]

)

 

--Variables to Solve for:

var y_sum = SUMX(data,[y_values])

var x_sum = SUMX(data,[x_values])

var x2 = SUMX(data,[x_values]^2)

var xy = SUMX(data,[x_values]*[y_values])

var row_count = COUNTROWS(data)

 

--Solve for formula

 

var Slope = ((row_count * xy) - (x_sum * y_sum))/((row_count*x2)-((x_sum)^2))

 

return Slope

-----------------------------------------------------

 

Anyone could help me?

2 REPLIES 2
Greg_Deckler
Super User
Super User

@pedroluccas So, duplicate your query in Power Query Editor. Remove all rows except the SAMPLE CODE column. Do a Remove Duplicates. Create an Index column (Add Columns | Index). You can then either do a merge to get your index or in a DAX calculated column, add a column to your original table that grabs the Index using RELATED.

 

You can find sample code here:

Correlation coefficient - Microsoft Power BI Community

If you have seasonality in your data:

De-Seasonalized Correlation Coefficient - Microsoft Power BI Community


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Thank you very much!

It worked for Slope and R². But now seems that I have other problem.

When I have a huge amount of samples of different products: A1AA/22 (Product A), B1AA/22 (Product B), C1AA/22 (Product C), etc, it generates a index with huge numbers (10,000 or more).

Other scenario is when I wnat to use just an interval of the samples (sample 5 - 12).

Since Intercept formula is b = [(Σy)*(Σx²) - (Σx) * (Σxy)] / [n * (Σx²) - (Σx)²], and now appears the factor (Σx²) on the numerator, huge numbers seems to affect the calculation.

 

See below the calculation and graphs

 

pedroluccas_0-1665057765038.png

pedroluccas_1-1665057840395.png

 

What can I do to make the index change according with my selection and restart the count with the number 1?

 

 

 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors