cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
pedroluccas
Frequent Visitor

LINEAR REGRESSION - X-AXIS IN TEXT FORMAT

Hello

I am working with a data sample that contains numeric values on "y" axis (pH results) and text values on "x" axis (sample/batch code) and need to build a linear regression analysis.

So far, I found a website "IterationInsights" that published an article "How to do simple linear regression in Power BI", 

which really helped me to analyse using date, but not sample code (since it is a text). I need to evaluate the trend between 2 samples, independent of when they were collected.

Note that if x is a Date, we have: y = 0.0023x - 99.065, r² = 0.2566

But if x is the sample, we have: y = 0.0297x + 5.5252, r² = 0.3024

pedroluccas_0-1664991735357.png

How am I able to build a variable table that converts the sample code to a numeric sequence (A1AA/22 = 1, A2AA/22 = 2, ...., A21AA/22 = 21)?
Besides, it's possible to filter this table? Ex: analyse A12AA/22 to A20AA/22 (interval which would automatically be converted to the numbers: A12AA/22 = 1, A13AA/22 = 2, ....,  A20AA/22 = 9, always starting with 1).

 

Since slope formula is: [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²],

x should be refeered as a sequence of natural numbers representing the samples.

n is the distinct count of the samples.

 

I suppose that I should build a data var table to convert sample code to a number, and use All selected function to filter the samples I'm working with, but I'm really having a hard time to combine these functions and obtain a functional linear regression equation.

I exemplified the solpe formula because I think the same solution would be applied to intercep, r², etc.

 

For slope, I used the formula:

---------------------------------------------------

SLOPE =

 

var data = SELECTCOLUMNS(ALLSELECTED(DATA),

                        "x_values",DATA[DATE],

                        "y_values",DATA[RESULTS]

)

 

--Variables to Solve for:

var y_sum = SUMX(data,[y_values])

var x_sum = SUMX(data,[x_values])

var x2 = SUMX(data,[x_values]^2)

var xy = SUMX(data,[x_values]*[y_values])

var row_count = COUNTROWS(data)

 

--Solve for formula

 

var Slope = ((row_count * xy) - (x_sum * y_sum))/((row_count*x2)-((x_sum)^2))

 

return Slope

-----------------------------------------------------

 

Anyone could help me?

2 REPLIES 2
Greg_Deckler
Super User
Super User

@pedroluccas So, duplicate your query in Power Query Editor. Remove all rows except the SAMPLE CODE column. Do a Remove Duplicates. Create an Index column (Add Columns | Index). You can then either do a merge to get your index or in a DAX calculated column, add a column to your original table that grabs the Index using RELATED.

 

You can find sample code here:

Correlation coefficient - Microsoft Power BI Community

If you have seasonality in your data:

De-Seasonalized Correlation Coefficient - Microsoft Power BI Community


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Thank you very much!

It worked for Slope and R². But now seems that I have other problem.

When I have a huge amount of samples of different products: A1AA/22 (Product A), B1AA/22 (Product B), C1AA/22 (Product C), etc, it generates a index with huge numbers (10,000 or more).

Other scenario is when I wnat to use just an interval of the samples (sample 5 - 12).

Since Intercept formula is b = [(Σy)*(Σx²) - (Σx) * (Σxy)] / [n * (Σx²) - (Σx)²], and now appears the factor (Σx²) on the numerator, huge numbers seems to affect the calculation.

 

See below the calculation and graphs

 

pedroluccas_0-1665057765038.png

pedroluccas_1-1665057840395.png

 

What can I do to make the index change according with my selection and restart the count with the number 1?

 

 

 

 

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Community News

Fabric Community News unified experience

Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors