Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hello
I am working with a data sample that contains numeric values on "y" axis (pH results) and text values on "x" axis (sample/batch code) and need to build a linear regression analysis.
So far, I found a website "IterationInsights" that published an article "How to do simple linear regression in Power BI",
which really helped me to analyse using date, but not sample code (since it is a text). I need to evaluate the trend between 2 samples, independent of when they were collected.
Note that if x is a Date, we have: y = 0.0023x - 99.065, r² = 0.2566
But if x is the sample, we have: y = 0.0297x + 5.5252, r² = 0.3024
How am I able to build a variable table that converts the sample code to a numeric sequence (A1AA/22 = 1, A2AA/22 = 2, ...., A21AA/22 = 21)?
Besides, it's possible to filter this table? Ex: analyse A12AA/22 to A20AA/22 (interval which would automatically be converted to the numbers: A12AA/22 = 1, A13AA/22 = 2, ...., A20AA/22 = 9, always starting with 1).
Since slope formula is: [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²],
x should be refeered as a sequence of natural numbers representing the samples.
n is the distinct count of the samples.
I suppose that I should build a data var table to convert sample code to a number, and use All selected function to filter the samples I'm working with, but I'm really having a hard time to combine these functions and obtain a functional linear regression equation.
I exemplified the solpe formula because I think the same solution would be applied to intercep, r², etc.
For slope, I used the formula:
---------------------------------------------------
SLOPE =
var data = SELECTCOLUMNS(ALLSELECTED(DATA),
"x_values",DATA[DATE],
"y_values",DATA[RESULTS]
)
--Variables to Solve for:
var y_sum = SUMX(data,[y_values])
var x_sum = SUMX(data,[x_values])
var x2 = SUMX(data,[x_values]^2)
var xy = SUMX(data,[x_values]*[y_values])
var row_count = COUNTROWS(data)
--Solve for formula
var Slope = ((row_count * xy) - (x_sum * y_sum))/((row_count*x2)-((x_sum)^2))
return Slope
-----------------------------------------------------
Anyone could help me?
@pedroluccas So, duplicate your query in Power Query Editor. Remove all rows except the SAMPLE CODE column. Do a Remove Duplicates. Create an Index column (Add Columns | Index). You can then either do a merge to get your index or in a DAX calculated column, add a column to your original table that grabs the Index using RELATED.
You can find sample code here:
Correlation coefficient - Microsoft Power BI Community
If you have seasonality in your data:
De-Seasonalized Correlation Coefficient - Microsoft Power BI Community
Thank you very much!
It worked for Slope and R². But now seems that I have other problem.
When I have a huge amount of samples of different products: A1AA/22 (Product A), B1AA/22 (Product B), C1AA/22 (Product C), etc, it generates a index with huge numbers (10,000 or more).
Other scenario is when I wnat to use just an interval of the samples (sample 5 - 12).
Since Intercept formula is b = [(Σy)*(Σx²) - (Σx) * (Σxy)] / [n * (Σx²) - (Σx)²], and now appears the factor (Σx²) on the numerator, huge numbers seems to affect the calculation.
See below the calculation and graphs
What can I do to make the index change according with my selection and restart the count with the number 1?
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
13 | |
12 | |
12 | |
8 | |
7 |
User | Count |
---|---|
16 | |
13 | |
11 | |
11 | |
9 |