The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hello,
I have a performance issue with cohort analysis in Power BI. I have a log of sessions for a web site in table ga_sessions. It contains 30 million records and has the following columns
(and also other fields as "ad campaign", "region", etc)
I want to select a cohort of users in past and look how they behave up to present. In T-SQL I write
SELECT s.medium, SUM(s.cost) cost, sum(s.revenue) revenue FROM ( SELECT gid, MIN(time) time FROM ga_sessions WHERE medium = 'email' AND time >= '2016-03-01' AND time < '2016-04-01' GROUP BY gid ) c JOIN ga_sessions s ON c.gid = s.gid AND s.time >= c.time GROUP BY s.medium ORDER BY cost DESC
This query finishes in 300 milliseconds. But I need the same in Power BI:
EVALUATE CALCULATETABLE ( SUMMARIZE ( GENERATE ( SUMMARIZE ( 'Cohort Sessions', 'Cohort Sessions'[gid], "c_time", MIN ( 'Cohort Sessions'[time] ) ), FILTER ( Sessions, Sessions[gid] = 'Cohort Sessions'[gid] && Sessions[time] >= [c_time] ) ), Sessions[medium], "cost", SUM ( Sessions[cost] ), "revenue", SUM ( Sessions[revenue] ), ), 'Cohort Sessions'[cohort medium] = "email", 'Cohort Sessions'[time] >= DATE ( 2016, 03, 01 ), 'Cohort Sessions'[time] < DATE ( 2016, 04, 01 ) ) ORDER BY [cost] desc
DAX query finishes in 7 seconds (9 seconds on cold cache)!
If I comment part Sessions[time] >= [c_time] the query finishes in 1.7 sec, but this condition is very important for me.
I created 2 instance of ga_sessions table ('Sessions' and 'Cohort Sessions') in Power BI since I need to provide user with distinct filters for cohort definition and futher sessions.
How I can optimize DAX query ?
Any thought around doing the join and such in your query ("M") versus in DAX?
User | Count |
---|---|
65 | |
62 | |
60 | |
53 | |
28 |
User | Count |
---|---|
181 | |
82 | |
67 | |
48 | |
44 |