Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
praphull
Helper I
Helper I

Dynamically select column based on slicer for regression/correlation calculation

Hi,
I have approx 400 columns & thousand of rows. All column have numerical value. 
My Requirement is that, Two slicers which select dynamically two different column that will represent as x & y-axis in regression equation. Basically I need relation between any of two column out of 400 based on slicer selection.

Unpivot option has challenge for big data. Also aggregation issue during unpivot. 
Kindly help to fix it.
Thanks in advance.

15 REPLIES 15
tamerj1
Super User
Super User

Hi @praphull 

This is higly challenging. If you provide a sample PBIX file with sample data, I would accept the challenge (no guarantees on results) 

 

@Greg_Deckler do you think that would be another use case of TOCSV?

Hi @tamerj1 ,
I am not able to insert pbix file. You can create any of 3-4 column (imagin we have 400+ column) like this. I need to select any of two column for regression equation as x & y-axis. I used LINEST Dax for calculating regression equation.

praphull_0-1696841964177.png

praphull_1-1696842201527.png

 

 

Hi @praphull 
Here I'm using Field Parameters for dynamic selection of the X-Axis and Y-Axis. However, when trying to calculate the slope and intersection dynamically I hit the wall :-DI have a strong feeling that there should be a much simpler way to accomplish the same but this is what I was able to accomplish so far.
Worth mentioning that with hundreds of columns and thousands of rows this solution might not be feasible in terms of performance.
Also note that I don't have very strong visualization skills and I didn't know how how to add the manual trend line to the scattered plot so I just overlapped it with a line chart. Please do not laugh at that 🙂

@Greg_Deckler Appreciate your input on this. Was that over thinking or it is just what it is?

 

1.png

 

Regression Value = 
VAR XColumn = MAX ( 'X-Axis'[Parameter] )
VAR YColumn = MAX ( 'Y_Axis'[Parameter] )
VAR T = ALLSELECTED ( 'Table' )
VAR NumberOfRows = COUNTROWS ( T )
VAR String1 = TOCSV ( T, -1, ",", TRUE )
VAR Items1 = SUBSTITUTE ( String1, UNICHAR ( 10 ), "|" )
VAR T1 = GENERATESERIES ( 1, NumberOfRows, 1 )
VAR T2 = SELECTCOLUMNS ( T1, "@Headers", PATHITEM ( Items1, 1 ), "@Details", PATHITEM ( Items1, [Value] + 1 ) )
VAR T3 =
    GENERATE ( 
        T2, 
        VAR HeaderString = [@Headers]
        VAR HeaderItems = SUBSTITUTE ( HeaderString, ",", "|" )
        VAR DetailString = [@Details]
        VAR DetailItems = SUBSTITUTE ( DetailString, ",", "|" )
        VAR Length = PATHLENGTH ( HeaderItems )
        VAR T4 = GENERATESERIES ( 1, Length, 1 )
        VAR T5 = SELECTCOLUMNS ( T4, "@Header1", PATHITEM ( HeaderItems, [Value] ), "@Detail1", PATHITEM ( DetailItems, [Value] ) )
        VAR T6 = SELECTCOLUMNS ( T4, "@Header2", PATHITEM ( HeaderItems, [Value] ), "@Detail2", PATHITEM ( DetailItems, [Value] ) )
        VAR T7 = FILTER ( T5, [@Header1] = "'Table'[" & XColumn & "]"  )
        VAR T8 = FILTER ( T6, [@Header2] = "'Table'[" & YColumn & "]"  )
        RETURN
            CROSSJOIN ( T7, T8 )
    )
VAR LinestTable = LINESTX ( T3, [@Detail2], [@Detail1] )
VAR Slope = MAXX ( LinestTable, [Slope1] )
VAR Intercept = MAXX ( LinestTable, [Intercept] )
VAR X = 
    MAXX (
        'Table',
        VAR T1 = CALCULATETABLE ( 'Table' )
        VAR String1 = TOCSV ( T1, 1, ",", TRUE )
        VAR Items1 = SUBSTITUTE ( String1, UNICHAR ( 10 ), "|" )
        VAR T2 = SELECTCOLUMNS ( { ( 1, 2 ) }, "@Headers", PATHITEM ( Items1, 1 ), "@Details", PATHITEM ( Items1, 2 ) )
        VAR T3 =
            GENERATE ( 
                T2, 
                VAR HeaderString = [@Headers]
                VAR HeaderItems = SUBSTITUTE ( HeaderString, ",", "|" )
                VAR DetailString = [@Details]
                VAR DetailItems = SUBSTITUTE ( DetailString, ",", "|" )
                VAR Length = PATHLENGTH ( HeaderItems )
                VAR T4 = GENERATESERIES ( 1, Length, 1 )
                RETURN
                    FILTER ( 
                        SELECTCOLUMNS ( 
                            T4, 
                            "@Header", PATHITEM ( HeaderItems, [Value] ),
                            "@Detail", PATHITEM ( DetailItems, [Value] ) 
                        ),
                        [@Header] = "'Table'[" & XColumn & "]" 
                    )
            )
        RETURN
            MAXX ( T3, VALUE ( [@Detail] ) )
    )
RETURN
    Slope * X + Intercept

 

Hi @tamerj1,
Great solution that you provided.
Also I need some suggesions. I am explaining what I did.
I used header as x-axis & y-axis. Then created measures using SWITCH DAX for column selection at both axis. Now, Simply I used craydec regression chart which is working fine, showing equation perfect but in pictorial view. So that finally we have to use LINEST/LINESTX dax.
So, can we create LINEST/LINESTX DAX using measures, otherway what you did.

praphull_0-1696949204467.png

praphull_1-1696949279296.pngpraphull_2-1696949318056.png

 

  

@praphull 

Yes probably you can but

1. I cannot 100% confirm without having the sample file. You share a download link of it.

2. Not sure if having a switch statement of 400 columns is practical. What happens if you add or delete columns? The code would break. 

Hi @tamerj1 ,
One more small thing needs to know, is it option in field parameterto to modify/add more field in case of addition more field in data base. Also can we select all column of table at a time  or have to select one by one.

@praphull 

That needs to be done manually with field parameters. The advantage of field parameters is that they are simple to create and and are aboe to dynamicaly change the tye display name as per the selected column or measure. However, I believe other solution that utilizes DAX to create something similar to a parameter table could be possible but way more complicated. 

@tamerj1 I'll have to take a look at this in greater detail. Been a while since I played around with regression analysis and I haven't really looked much at LINEST function. Thanks for the PBIX. I'll have a look at what you did here, it looks pretty interesting. 

 

I do come back to that even with 400 column and let's say 10,000 rows, that's only 4 million rows unpivoted so shouldn't be that problematic and shouldn't increase the size of the dataset that tremendously since there should be pretty decent compression. Would likely simplify this problem greatly. Always up for a challenge though!


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler 
You are absolutely right and I am always with simplest possible solution. However, I have noticed that in many cases people where only allowed to create live connections to pre-published datasets that they don't have the privilege to edit and thus trying very hard to deal with their unpivotted data to perform some sort of aggregation across columns.

 

One more thing I would like to add is that unpivoting 400 columns using power query is not that efficient. The last time I gave that advise to person in the community he ended up with total crash of his pbix file.

@tamerj1 OK, first of all, this is very impressive code to do the column selection. Extremely creative. Where I *think* it could be improved potentially is in your calculation of X and Y. It occurs to me that after you calculate the slope and intercept that you could create a table of the original X and Y values doing something like this:

 

VAR __Table = SELECTCOLUMNS( T3, "Y", [@Detail2], "X", [@Detail1] )
 
Once you have that, you could use ADDCOLUMNS to calculate the estimated Y for each X using the slope and intercept and X value for each row. Then you could just grab the current X value in the visual and essentially "lookup" the corresponding calculated Y value (estimated) versus the actual measurement. What do you think?

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler 

That is a great observation! Wow! That would potentially improve the performance significantly and make code simpler and shorter. Let me try to implement it first thing in the morning. 
Thank you so much. Allways appreciate your input. 

@tamerj1 Now that I'm thinking about it, you could actually just grab the current value of X using MAX and calculate the estimated Y using the slope and intercept, no need for the intermediate table. I *think*!


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler 
I'm not sure I fully understand this point. Please clarify further.
I wasn't able to try your previous suggestions today. I hope will be able to do it tomorrow. I'm so exited to finding a more simplified solution.

@tamerj1 Nevermind that last one, I forgot that you are using field parameters so all you will get back is the value of the parameter "Open", "Closed", etc. not the actual value for the x-axis. I think that might invalidate the previous thought I had as well. If you can't know the current value of the X-Axis then I don't think either approach will work.


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@tamerj1 Very fair observations. I'm still digging into your code and seeing what it is doing.


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors